Psychometrika 


VOLUME XXI—1956 
JANUARY-DECEMBER 





Editorial Council 


Chairman:—HaroLp GULLIKSEN Managing Editor:— 
Lye V. JONES 


Editors:—Dororuy C. ADKINS Assistant Managing Editor:— 
Paut Horst B. J. WINER 


Editorial Board 


Dorotuy C. Apkins’ Henry E. GARRETT IrRvING LORGE 
R. L. ANDERSON Leo A. GooDMAN Quinn McNEMAR 


T. W. ANDERSON Bert F. GREEN GrorceE A. MILLER 

J. B. CARROLL J. P. GuILForpD Wao. G. Mo.LuenKopr 
H. 8S. Conrap HAROLD GULLIKSEN Lincotn E. Moses 

L. J. CRONBACH Paut Horst GrorRGE E. NICHOLSON 
E. E. CurEton Aston S. HousEHOLDER M. W. RIcHARDSON 
Pau. 8. DwYyER Lioyp G. HUMPHREYS R. L. THORNDIKE 
ALLEN EpwArpDs TruMAN L. KELLEY LepYARD TUCKER 
Max D. ENcetHart  A.BeErtT K. Kurtz D. F. Votaw, Jr. 

Ws. K. Estes Freperic M. Lorp 





PUBLISHED QUARTERLY 


By THE PSYCHOMETRIC SOCIETY 
AT 1407 SHERWOOD AVENUE 
RICHMOND 5, VIRGINIA 











ed 


Lo 


oN 


Psychometrika 


A JOURNAL DEVOTED TO THE DEVEL- 
OPMENT OF PSYCHOLOGY AS A 
QUANTITATIVE RATIONAL SCIENCE 


















































THE PSYCHOMETRIC SOCIETY - ORGANIZED iN 1935 


VOLUME 21 “— 
NUMBER 1 Z 
1ARCH 
9 5 6 





























PsycHometnrix,, the official journal of the Psychometric Society, is devoted to the develop- 
ment of psychology as a quantitative rational science. Issued four times a year, on March 15, 
June 15, September 15, and December 15. 


Manca, 1956, Votume 21, Numpzs 1 


Published by the Psychometric Society at 1407 Sherwood Avenue, Richmond 5, Virginia. 
Second-class mail privileges authorized at Richmond, Virginia. Editorial Office, Depart- 
ment of Psychology, The University of North Carolina, Chapel Hill, North Carolina. 


Subscription Price: The regular subscription rate is $14.00 per volume. The subscriber 
receives each issue as it comes out, and, upon request, a second complete set for binding at 
the end of the year. All annual subscriptions start with the March issue and cover the calen- 
dar year. All back issues but two are available. Back issues are $14.00 per volume (one set 
only) or $3.50 per issue, with a 20 per cent discount to Psychometric Society Members. 
Members of the Psychometric Society pay annual dues of $7.00, of which $6.30 is in payment 
of a subscription to Psychometrika. Student members of the Psychometric Society pay 
annual dues of $4.00, of which $3.60 is in payment for the journal. 


Application for membership and student membership in the Psychometric Society, together 
with a check for dues for the calendar year in which application is made, should be sent to 


Lronarp Kogan 
105 East 22nd Street, New York 10, N. Y. 


Payments; All bills and orders are payable in advance. 
Checks covering membership dues should be made payable to the Psychomeiric Society. 


Checks covering regular subscriptions to Psychometrika (for non-members of the Psycho- 
metric Society) and back issue orders should be made payable to the Psychometric Corpora- 
tion. All checks, notices of change of address, and business communications should be sent to 


Davi R. Saunpers, Treasurer, Psychometric Society and Psychometric Corporation 
Educational Testing Service 

P.O. Box 592 

Princeton, New Jersey 


Articles on the following subjects are published in Psychometrika: 
(1) the development of quantitative rationale for the solution of psychological 
problems; 
(2) general theoretical articles on quantitative methodology in the social and bio- 
logical sciences; 
(3) new mathematical and statistical techniques for the evaluation of psychological 
data; 
(4) aids in the application of statistical techniques, such as nomographs, _ 
work-sheet layouts, forms, and apparatus; 
(5) critiques or reviews of significant studies involving the use of cuanittaline tech- 
niques. 
The emphasis is to be placed on articles of type (1), in so far as articles of this type are 
available. 
(Continued on the back inside cover page) 











re 
es 








é) 





vi 


sea 
Sn 








Psychometrika 





CONTENTS 


THE ADDITIVE CONSTANT PROBLEM IN MULTI- 
DIMENSIONAL SCALING 


SAMUEL J. MEssicK AND RoBert P. ABELSON 


A RAPID NON-PARAMETRIC ESTIMATE OF MULTI-JUDGE 
RELIABILITY 
DeEsmonp S. CARTWRIGHT 
A STUDY OF SPEED FACTORS IN TESTS AND ACADEMIC 
GRADES 
FrepDErRIc M. Lorp 
OPTIMAL TEST LENGTH FOR MAXIMUM DIFFERENTIAL 
PREDICTION 
Paut Horst 
THE REGRESSION OF GAINS UPON INITIAL SCORES... . 
R. F. GARSIDE 
A METHOD OF SCALOGRAM ANALYSIS USING SUMMARY 
STATISTICS 
Bert F. GREEN 
NOTE ON CARROLL’S ANALYTIC SIMPLE STRUCTURE... 
Henry F. Kaiser 
QUINN McNEMAR, Psychological Statistics, Second Edition 
A Review by Epwarp E. Cureton 
ROBERT L. THORNDIKE anp ELIZABETH HAGEN, Measure- 
ment and Evaluation in Psychology and Education 
A Review by Joun E. MInHoLLAND 
DECISION PROCESSES, Thrall, R. M., Coombs, C. H., and Davis, 
R. L. (Editors) 
A Review by R. Duncan Luce 


LIST OF MEMBERS OF THE PSYCHOMETRIC SOCIETY... 








VOLUME TWENTY-ONE MARCH 1956 NUMBER 1 








COOPERATIVE GRADUATE SUMMER SESSIONS IN STATISTICS 


The University of Florida, North Carolina State College, Virginia Poly- 
technic Institute, and the Southern Regional Education Board are jointly 
sponsoring a series of cooperative summer sessions in statistics. 

The third of these summer sessions will be held at North Carolina State 
College, June 11-July 20, 1956. A session is scheduled to be held at Virginia 
Polytechnic Institute in 1957 and at the University of Florida in 1958. Each 
summer session lasts six weeks, and each course carries approximately three 
semester hours of graduate credit. 

The 1956 session will be held jointly with the Institute in Quantitative 
Research Methods in Agricultural Economics, sponsored by the Social Science 
Research Council. Several statistics courses will be oriented towards economic 
applications. 

The combined faculty for the 1956 summer session and Institute at 
North Carolina State College will include: Professor R. L. Anderson, North 
Carolina State College; Professor Gertrude M. Cox, North Carolina State 
College; Professor David B. Duncan, University of Florida; Professor Alva 
L. Finkner, North Carolina State College; Dr. Arnold H. E. Grandage, North 
Carolina State College; Professor Robert J. Hader, North Carolina State 
College; Assistant Professor Cleon Harrell, North Carolina State College; 
Professor Earl O. Heady, Iowa State College; Professor Clifford G. Hildreth, 
Michigan State University; Professor Jack Levine, North Carolina State 
College; Professor Robert J. Monroe, North Carolina State College; and 
Assistant Professor Walter L. Smith, University of North Carolina. 

Courses to be offered this summer are: Statistical Methods I, Statistical 
Methods II (Design of Experiments), Statistical Theory I (Probability and 
Parent Distribution), Statistical Theory II (Sampling Distributions and 
Inference), Sample Survey Designs, Advanced Analysis II, Advanced Calculus 
for Statistics, Stochastic Processes, Econometric Methods and Linear Pro- 
gramming. Lectures on Linear Equations (Matrix Algebra) and Production 
Functions will be given in the Institute program. 


Inquiries should be addressed to: 


Professor J. A. Rigney 

Department of Experimental Statistics 
North Carolina State College 

Raleigh, North Carolina 














PSYCHOMETRIKA—VOL. 21, NO. 1 
MARCH, 1956 


THE ADDITIVE CONSTANT PROBLEM 
IN MULTIDIMENSIONAL SCALING* 


SamuEL J. MeEssickt 


EDUCATIONAL TESTING SERVICE 
AND 
Rosert P. ABELSON 


YALE UNIVERSITY 


The problem of choosing the correct additive constant to convert rela- 
tive interstimulus distances to absolute interstimulus distances in multidimen- 
sional scaling is investigated. An artificial numerical example is constructed, 
and various trial values of the constant are inserted to demonstrate the 
effect on the multidimensional map of making a variety of incorrect choices. 
Finally, a general solution to the problem, suggested by Dr. Ledyard R 
Tucker, is presented; each of the computational steps in this solution is 
set. down for easy reference. 


The various procedures that have been thus far described for multidimen- 
sional scaling of stimulus objects [Torgerson (7), Attneave (2), Klingberg (4)] 
all have involved what might be termed the ‘additive constant problem.” 
A recent variation on Torgerson’s procedure [Abelson (1), Messick (5)] 
also encounters the same problem. All of these procedures set up, at an early 
stage of the analysis, a matrix S;, of scale values representing distance 
estimates. Each element, s;, , of this matrix represents the estimated psycho- 
logical distance between stimulus 7 and stimulus k. Due to the nature of the 
scaling procedure used, however, a constant can be added to all the distance 
estimates without affecting the validity of the scale of psychological distances. 
This could be expressed by 


8, +e= dy (7 ~ k), (1) 


where s;; is the relative distance between j and k, and ¢ is the additive constant 
which re-locates the zero point to produce d;, , the absolute distance between 
j and k. To put it another way, the distances s;, are all relative; there seems 
to be no way of directly determining the zero point on the distance scale. 
The choice of a particular value of the additive constant will in theory 
affect the subsequent multidimensional mapping of the stimuli, and thus the 
investigator is faced with the problem of making the most valid choice, or, 


*This study was supported in part by Office of Naval Research Contract N6onr- 
270-20 and by National Science Foundation Grant G-642 to Princeton University. 
tNow at Menninger Foundation. 
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if you prefer, the best choice according to some pragmatic criterion. The 
criterion which has been previously suggested is that of minimizing the 
dimensionality of the stimulus space. Torgerson gave an approximate solution 
for the additive constant but did not explore the nature of the problem 
deeply. The following derivation leads to an evaluation of the effect of the 
constant on the multidimensional structure and provides the groundwork 
for a general solution to the problem. 

The most systematic analytical procedure for arriving at a multidimen- 
sional map of the stimuli from the distance matrix D is that given by Tor- 
gerson. Each d;, is first squared and these squared distances are arrayed in a 
matrix. Then a B matrix is calculated from this matrix of squared distances. 
The multidimensional map is obtained by factoring the B matrix. This 
procedure can be summarized in the following equations, which embody 
some simplifications of Torgerson’s procedure. For n stimuli the elements 
b;, of the B matrix can be written as 


= - 2s 
2%. =2 det Da -G-SOLA, (? iit *). (2) 
k i 1 k 


k=1,2,---,n 





Let the matrix F be the basis for the final mapping. An element f;,, of 
F represents the projection of stimulus j on axis m. The origin of this system 
is at the centroid of the stimuli. 


B = FF’. (3) 


However, the d;, are not available, but the set of scale values s;, can 
be obtained by any of the suggested scaling methods. s;, is related to d;, for 
j # k by (1). The distance d;; between stimulus 7 and itself is assumed to 
be zero; all s;; would ordinarily equal zero as well. 


d;; = 8;; = 0. (4) 
The constant c is to be added, then, only when 7 # k. One equation can 
express the relationships of (1) and (4). 
dj. = 8 +c(1 — &), (5) 
where 
8;, = 0 for all j, (6) 
and 
—_ jo when jAk 


(7) 
/ ] when j=k. 


Now the effect of the additive constant on the subsequent analysis can 
be assessed by substituting (5) into (2) and simplifying with the aid of (7). 
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Dr. Bert F. Green, Jr., has indicated in a letter to the authors that he has 
independently arrived at an equivalent derivation. This will show the in- 
fluence of c on the B matrix, although it will not directly show the influence 
on the final F. A numerical example to be presented will attempt to assess 
this influence. The terms of (2) are written 


ie = Sis + 283, + e(1 << 5i). (8) 
Note that (1 — 61)’ = (1 — 6,). Note further that (1 — dij)su = sn, 
since when j ¥ k, 6 = 0 and (1 — 0)s;, = s;, ; whenj = k, 6; = 1 and 


(1 — 1)8;, = 0 = 8;;. ; 
(n spe 1) Cc LIE 





1 1 

> dh = 2 Dat = ont (9) 
n k n k k ~ 

1 2 ae 2 2c (n—1) ; 

= ae te dete FS Stet Cc. (10) 


9 - 
POLG-SO LAZO Lat Se. a 


lar as 


Equation (2) can now be written as 


74 — 1 2 
2b => Dae t 2 an + SD Sy 


9 2c — 1 ° 
te gee, a Re dg 
a= = n (12) 





1 ; 2c n—1), 
-5E Dh - 3D Ln - Se 
n i k n i k 
— si, — 2es;, — c'(1 — &). 
Grouping terms according to the power of c yields 


] I] : 2 l 2 
ae Le tin += do sik — Bin — 3 Db sis 
k ] 


7 k 


l I , 
+ 2d! bo Sin + . z. Rigo Sy =e a 2 si] (13) 
k i k 


+ eat - 1). 


n 


The first four terms give the result if ¢ is set equal to zero and are equi- 
valent to (2) with s;, = d;, . Examination of (13) shows that the effect of 
the additive constant is felt differentially in the diagonal and off-diagonal 
elements of B. Substitution of typical numbers for the s;, will show that the 
b;, are often insensitive to various choices of c. The value of (13), however, 
is that it will serve as a basis for a general solution of the additive constant 
problem. 
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A Numerical Example 


A numerical example will help in clarifying the role of the additive 
constant and in assessing its influence on the final scales. Eight points were 
set up in two dimensions in the shape of a square, with one point placed in 
the middle. This configuration of nine points is shown in Figure 1. The 








TABLE 1 
Dx 
A B “4 D E F G H I 
A 0 1 2 1 1.414 2.236 2 2.236 2.828 
B 1 ©) z 1.414 1 1.414 2.236 2 2.236 
Cc 2 1 0 2.236 1.414 1 2.828 2.236 2 
D 1 1.414 2.236 0 2 2 1 1.414 2.236 
E 1.414 1 1.414 1 fe) 1 1.414 1 1.414 
F 2.236 1.414 1 2 ct 0 2.236 1.414 1 
G 2 2.236 2.828 1 1.414 2.236 0 1. 2 
H 2.236 2 2.236 1.414 1 1.414 1 fe) 1 
I 2.828 2.236 2 2.236 1.414 rt 2 i. fe) 





Absolute interpoint distences for the two-dimensional configura- 


absolute interpoint distances d;, for this configuration are presented in 
Table 1. A set of relative interpoint distances can be obtained from Table 1 
by utilizing the relationship of (5). The particular set of s;, chosen for this 
example is found by subtracting one from every element of D,, . This pro- 
cedure provides the set of relative interpoint distances s;, of Table 2. 

In an actual experiment the absolute interpoint distances d;, are not 
available. The s;, of Table 2 would represent the data and would be obtained 
from any of the suggested scaling methods. The s;, selected for this problem 
is the set for which the smallest scale value is placed equal to zero. The 
problem for the experimenter is to estimate the constant to be added to the 
8;, (Table 2) in order to obtain a set of d;, (Table 1) which minimizes the 
dimensionality of the stimulus space. The s,;, could not be used directly as 
d,;, by setting c = 0 because certain inconsistencies exist among these dis- 
tances, such as the combination s,, = 0, s,, = 0, and s,, = 1. If c = 1 were 




















2 a 
A : Cc 
® 1@® © 
D Hl F 
1 ri £2) i 
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ay 2b 
FIGURE 1 


Two-dimensional Representation of the Absolute 
Interpoint Distances, Dis Given in Table 1 











TABLE 2 
8 
A B Cc D E F G H I 
A 0 1 0 e414 1.236 1 1.236 1.828 
B fe) 0 414 0) 41h =1.236 a 1.236 
c 1 fe) 1.236 414 fe) 1.828 1.236 x 
D 0 414 = 1.236 0 1 0 414 1.236 
E 414 0 414 0 ) 414 0 414 
F 1.236 414 0) 1 0 1.236 414 0 
G 1 1.236 1.828 0 e414 1.236 0 1 
H 1.236 1 1.236 e414 ) 41h 0 0 
I 1.828 1.236 1 1.236 414 0 1 0 





Relative interpoint distances obtained by scaling procedure, setting 
the smallest distance equal to zero. Additive constant necessary to con- 
vert these relative distances into absolute distances, Diy iscsl. 
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added to every element of Table 2, the absolute interpoint distances of 
Table 1 would be obtained; these distances are exactly appropriate to the 
two-dimensional configuration of Figure 1. No value of the additive constant 
can be found which would yield a one-dimensional configuration. Thus 
c = 1 is the value which minimizes the dimensionality of the stimulus space. 
The problem of how to estimate this proper value of the constant, however, 
remains to be considered. 

In an attempt to assess the influence of the additive constant on the 
configuration of points, a B;, matrix was computed for each of the following 
trial values of the additive constant: c = 4, 3, 2, 1,0, —1, —2. These seven 
B;, matrices were obtained from the S;, matrix of Table 2 by substituting 
the successive values of c into (13). Latent roots and latent vectors were 
obtained for each B;, (Table 3). All nine roots were extracted for each matrix, 
and these values were plotted as a function of the additive constant (Figure 2). 
The “correct” value of c for this S;, is unity, at which point there are two 
positive roots and seven zero roots. As c changes, the vectors generate a 
surface, so that for any two values of c, it is possible to identify corresponding 
vectors. For overestimated values of c (2, 3, 4) the large roots are still easily 
distinguished from the small ones, but for underestimated values (0, —1, —2) 
the roots corresponding to the “true” configuration get smaller and finally 
become negative. Since underestimated values of c offer this possibility of 
an imaginary space, it is considerably better to overestimate c than to under- 
estimate it. If the configuration is plotted for various values of ¢ (the values 
for the plots are obtained from Table 3), it can be seen that as c is over- 
estimated more and more, the configuration becomes larger and slightly 
“convex’’ (see Figure 3). As c is underestimated more and more the configura- 
tion becomes smaller and ‘‘concave,”’ but it changes shape at a much faster 
rate upon underestimation than it does upon overestimation. The configura- 
tions forc = Oandc = 4 are given in Figure 3. With normalized characteristic 
vectors, there is smaller relative change when c is large compared to c being 
small. Since errors arising from the fallibility of scaling procedures would 
tend to make points which really lie on a straight line deviate from this 
straight line, it can be shown that the methods of estimating the additive 
constant described by Torgerson would always give an underestimate. 


A General Solution for the Additive Constant 


At the “true” solution for the additive constant there is a minimum 
number of large positive roots and the rest of the roots are zero. With fallible 
data, however, it should be remembered that the small roots will probably 
not equal zero but will vary positively and negatively around zero. For 
any symmetric matrix the sum of all the latent roots is equal to the sum of 
the diagonal elements. If the sum of only the large latent roots is set equal 
to the sum of the diagonals, a value of ¢ is obtained which sets the sum of 
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the other roots equal to zero. Dr. Ledyard R Tucker suggested this solution 
and has made this general approach practical by contributing much in the 
way of matrix simplification. 

From (13) the diagonal elements of B;, may be written as 


Fy eee. oF ee ) 
bj; a a1 2 sis ae Do sis n? > Dd sis 
1 1 1 
tdi Rute bu Ee) ao 


to | 
+he(1 - 2). 
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22 





=-2 el 1 2 





Configuration of Points 


Corresponding to c = 0 
© © 


-26 


Configuration of Points Corresponding toc = 4 





FIGURE 3 


Note thatforrj=k, Dissh= Ds and Sisra= disy. 
k i k i 


1 2 1 
bi; = . Do siz = ee Zz > ss +2 2 en = ae bs sn) 
es en i k nr, nr i k . 
(15) 
1 ( ne 1) 
+ 5° 1 rn)" 
The sum of the diagonal elements is 
Yb,-2 DUA -L UUs 
Fi NS Lhe (16) 


2 1 Ne 1 
+d > Den FE Een) + 3e(1— 4) 


Simplifying, 
Eby = 5D Leh tO Dew t zm — de. (17) 
Let X, be the latent vector corresponding to the first latent root 8; . 
BX, = BX, . (18) 


If X, is normalized so that the sum of the squares of its elements is equal 
to unity, then, 


X{BX, = Bi . (19) 
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The sum of the large latent roots is 
> 6 = X{BX, + XjBX, + --- + X/BX,, (20) 


where p is the number of large principal components. 
Equation 13 may be rewritten in matrix notation as 


B=A+cE + 3H, (21) 


where the elements a;, , e;, , and h;, of matrices A, E, and H, respectively, are 


1 > 1 . 1 2 
an = 3( Seti Dad -4E Da), (22) 


n U j k 
1 1 1 
) Meee z, 83, + — > Ci Sie 8 z Do Sit ’ (23) 
i Ts. nN °S n i k 


hy, = wt (j #k), and h;; = (1 — 1). (24) 


n n 
Substituting (21) in (20), 


> 8 


(X{/AX, + cX{EX, + 30°X{HX,) 
+ (XjAX, + cXJEX, + 3¢°XjHX,) (25) 
+ +--+» + (X{AX, + cX/EX, + 3¢°X/HX,). 


> 6 


In the following consideration of the matrix H, it is found that the last 
term of (26) can be simplified. The diagonal elements of H are (1 — 1/n) 
and the off-diagonal elements are (—1/n). Since the sum of each row or 
column is equa! to zero, one of the roots of the H matrix is zero. The other 


\| 


> X'AX, +e VX'EX, tie Yo XX, . (26) 
y) Be 


(n — 1) roots are all equal to unity. The vector corresponding to the zero 
root has equal coefficients, and the other (n — 1) vectors are indeterminate. 


However, any vector which has a sum of coefficients equal to zero is a possible 
vector for H. Since the sum of each row or column of B is also equal to zero 
(it is a centroid matrix), every B matrix also has a zero root for which the 
vector has equal coefficients. In order to maintain the orthogonality of the 
principal! vectors of B, the sum of the coefficients of each of the other vectors 
must equal zero. Therefore, the vectors of B are possible vectors of H. The 
vector with the vanishing root is not considered in either B or H. 


X!HX, = By, = 1. (27) 


= RHA, = > Bu, = p (p = number of roots taken). (28) 
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Therefore, (26) can be written 
2 E a j oe 
Lsb= Dd, X:AX; + ¢ do XtEX; + 5 pe’. (29) 


If the sum of the large latent roots of (29) is set equal to the sum of the 
diagonals of (17) to yield a quadratic in c, the solution of this quadratic 
equation will give the value of ¢ which sets the sum of the small latent roots 
equal to zero. 

The solution outlined in (17) to (29) involves several computational 
steps. In any of the suggested scaling procedures it is necessary to obtain a 
matrix of relative interpoint distances S;, which can be adjusted so that the 
smallest interpoint distance is set equal to zero. A trial value of c must then 
be selected, and it is much better to overestimate than to underestimate. 
In practice either the largest value or the average value in the S,, matrix 
could be used as a trial value. If some of the stimuli are relatively close together 
these will usually provide an overestimate of c. In any event, the value of 
the additive constant obtained in this solution will indicate whether c was 
overestimated and is being approached from above, or whether it was under- 
estimated and is being approached from below. 


A Numerical Illustration of the General Solution for the Additive Constant 


In order to try out the solution, the “data” of Table 2 were selected, 
and a trial value of c = 4 was chosen. This trial value was then inserted into 
(13) to obtain a B matrix (Table 4). A principal components analysis of B 
yields the X,; vectors (the two largest principal components for c = 4 are 
given in the first two columns of Table 3). It is not necessary, however, to 


TABLE 4 


Bix 


A B c D E F G 3 I 











-97830 -3.56305 -1.05060 -3.56305 -5.53340 


A 11.44940 2.14480 -1.05060 2.14480 


B 2.14480 8.84020 2.14480 -.90150 -.54090 -.90150 -%3.56305 -3.65980 -3.56305 
C -1.05060 2.14480 11.44940 -3.56305 -.97830 2.14480 -5.53340 -3.56305 -1.05060 
D 2.14480 -.90150 -3.56305 8.84020 -.54090 -3.65980 2.14480 -.90150 -3.56305 
E -.97830 -.54090 -.97830 -.54090 6.07760 -.54090 -.97830 -.54090 -.97830 
F  -3.56305 -.90150 2.14480 -3.65980 -.54090 8.84020 -3.56305 -.90150 2.14480 
G -1.05060 -%3.56305 -5.53340 2.14480 -.97825 -3.56305 11.44940 2.14480 -1.05060 
H -3.56305 -%3.65980 -3.56305 -.90150 -.54090 -.90150 2.14480 8.84020 2.14480 


a8) 


-97830 2.14480 -1.05060 14480 11.44940 


I -5.53340 -%3.56305 -1.05060 -3.56305 





Bs matrix based upon an additive constant of c = 4. 
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extract all of the principal components—only the large ones. Inasmuch as 
the centroid method is an approximation to principal axes, a centroid analysis 
could be used if the number of factors extracted is not large. Iterations of 
this solution will then yield closer and closer approximations to principal 


components as well as closer approximations to c. The X,; vectors must then 


be normalized. 

















X;, (normalized) = [ 39805 | X, (normalized) = 39805 | 
42792 | 0 
.39805 | - 39805 
0 | 42792 
0 : | 0 
= 
— .39805 e 39805 
— .42792 0 
_— .39805 | — 389805 


The values of >>;>>, s?, and }>;>>, s;, obtained from Table 2 are then 
substituted in (17) to yield the sum of the diagonals of B. 


> 5;; = 2.91956 + 5.0791le + 4c’. (30) 


Matrices A and £ are constructed using (22) and (23). Substituting the 
appropriate numerical values into (29) gives 


> B = 4.56580 + 6.41804c + c’. (31) 


Equating (30) and (31) and solving, c = .997 andc = —.55. 

The graph of Figure 2 indicates that it is considerably easier to distinguish 
the large roots from the small ones when the sum of the latent roots is large. 
The value of ¢ desired, then, is the one which will give the highest >> 8. 
Substituting the two roots into (29), the desired c is seen to be .997. 


> B = 4.56580 + 6.41804 (.997) + (.997)? = 11.96 
>> B = 4.56580 + 6.41804 (—.55) + (—.55)? = 1.34. 


The correct value of the additive constant for the S,, of Table 2 is unity. 
The solution of .997 may be considered only an approximation to this correct 
value, since the X; vectors are obtained from a B matrix based on a trial 
value of c. But this approximation is probably adequate without recourse 
to an iterative procedure, since the latent vectors do not differ widely for 
different overestimated tria\ values of c (see Table 3), and the coefficients of 
the other equation involved in the solution (17) are independent of ce. 
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If the obtained c is satisfactory, it can be used with (13) to produce 
another B matrix, which can then be factored to yield the final F. If a more 
exact solution for c were desired, the columns of F could be substituted for 
the X, vectors in (29) to obtain a new equation for >> 8 as a function of c. 
This new expression for >, 6 could then be set equal to the expression for 
the sum of diagonals of B* (17), and a closer approximation to c would be 
obtained. If it became apparent that the original trial value was an under- 
estimate, it might be necessary to re-calculate the X; vectors with a new 
trial c, because the underestimated value might have led to the choice of the 
wrong dimensions or to the addition of extraneous dimensions (see Figure 2). 

It is also possible, however, to obtain the F matrix without computing 
another B, by using the general matrix factoring solution (3), which is sum- 
marized in the following two equations: 


(BX)K™ = F, (32) 
where X is a matrix of the X; vectors, and K is a matrix for which 
X’BX = K’'EK. (33) 


The matrix (X’BX) is symmetric and should be factored, say by the 
diagonal or square root method (6), to obtain K’. K~' can then be computed 
and applied in (32). It was seen above that 


B=A+cE + 1H, (21) 
BX = (AX) +c(EX) + 3¢(HX). (34) 


Thus, (BX) can be found without computing B. It should be pointed out 
that (AX), (EX), and (HX) are already available from the computations 
involved in (29). In practice, however, it might be desirable to compute B 
anyway, in order to evaluate residuals. 

Before summarizing this procedure, it is important to point out that the 
additive constant problem should be regarded in the light of the purpose for 
which a multidimensional scale is being constructed. The investigator might, 
on the one hand, wish to do a broad exploratory study [ef. (1)], or, on the other 
hand, he may be interested in a fairly sensitive analysis [cf. (8)]. In the latter 
case, a slight dislocation of the stimuli in the psychological space might be 
damaging, and it would be important to get a near-exact solution for c. In 
the former case, however, where only the broad general structure of the 
space is of interest, there is apt to be little or no loss in choosing ¢ larger than 
the best value. (Note how small the distortion is in the configuration in Figure 
3 forc = 4.) 

A very convenient, crude approximation to ¢ in this case is as follows: 
Make all of the relative interpoint distances positive by adding some arbitrary 
positive number to every element of S;, . A gross approximation to the 
solution would be to ignore the additive constant problem from this point on, 
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i.e., consider these relative interpoint distances as absolute interpoint dis- 
tances and proceed with the analysis. Since the underestimation of c could 
lead to an imaginary space or to the addition of extraneous dimensions, this 
arbitrary positive number should be chosen large. If the smallest s;, is felt to 
represent a distance judgment fairly close to a true psychological zero, the 
average of the s,, will be a good choice for c. If the stimuli are thought to be 
psychologically disparate from each other, the largest s;, can be chosen for c. 
However, just selecting a large value for the additive constant without 
applying the general solution given above can also be dangerous and should 
be done only with extreme caution. It is apparent in Figure 2 that for large 
values of c. some of the small roots take on appreciable values and may thus 
be included in the selection of the large roots. Extreme care should be taken, 
then, in interpreting any dimensions obtained without applying the general 
solution for the additive constant, for some of them may have been added 
in this extraneous fashion. 


Procedure for Obtaining the Additive Constant in Multidimensional Scaling 


1. Using any of the suggested scaling procedures, obtain a set of relative interpoint 
distances. Adjust these distances so that the smallest scale value is equal to zero to produce 
S, . Also S;; = 0. Then construct a matrix S? of squared relative distances, where each 


element is s~ 


2. Compute ie Sit ; >. Sik 5 >  & Siz 5 - nA ’ i i ’ > > e, ’ 
k i i k k i i k 
] l ] ] <> 1 2 
= Rites — 278i 3 Dy Betis — Boliss = Behe» 3 De de tn - 
1 j k nm k nm i n i k 


rt k 1 


3. Substitute the proper coefficients in (17) to obtain the sum of the diagonals of B. 





k- 


4. Construct matrix A, see (22), and matrix E, see (23). 

5. Select a trial value for c. Usually the largest entry in S;, will suffice. 

6. Obtain a B matrix by inserting the trial c into (13) or (21). 

7. Extract the large principal components (X; vectors) of B. The centroid method 
may be used as an approximation. Iteration of the solution will then give closer approxi- 
mations to principal axes as well as closer approximations to c. If, however, the trial value 
of ¢ happens to be nearly the correct value, these X; vectors will be very close to propor- 
tionality with the columns of F, the eventual matrix of projections. 

&. Normalize each X; vector so that the sum of the squares of its elements is equal 
to unity 

9. Compute the coefficients for the equation for the sum of the latent roots of B; 
see (29) 

10. Set the expression for the sum of the latent roots (29) equal to the expression for 
the sum of the diagonals (17) found in step 3. Solve the resulting quadratic for c, selecting 
the value of ¢ which produces the largest )> B. 

iJ. Insert this value of c into (13) or (21) to obtain a B matrix. Factor this B matrix 
to obtain the final F. B = FF’. Alternately, utilize (AX) and (HX) from step 9 and con- 
struct (BX) using (34). Then compute the matrix (X’BX). The latter is symmetric and 
should be factored by the diagonal method to obtain K’. Compute K~! and insert it into 
(32). 
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12. Iterate this solution for c by substituting columns of F for the X; vectors in 


(29) above. Set this new equation for >, 8 equal to (17) and solve for a closer approximation 
to the additive constant. Further experience with this procedure may demonstrate that 
steps 1-11 give such a close approximation to c that this iteration is unnecessary. 
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A RAPID NON-PARAMETRIC ESTIMATE OF MULTI-JUDGE 
. RELIABILITY* 


DeEsmonp 8S. CARTWRIGHT 
UNIVERSITY OF CHICAGO 


A technique is presented for obtaining a rapid estimate of reliability 
between judges, with special reference to qualitative judgments. It is shown 
that reliability and discrimination are independent and that estimates of both 
are needed. A method of obtaining an independent estimate of multi-judge dis- 
crimination is developed. It is shown that the size of item-samples is specified 
by the latter method. Tests of significance for both reliability and discrimina- 
tion are described. 


Whenever judgments or ratings are made, experimenters are faced with 
the problem of obtaining the reliability of such judgments. Frequently the 
average intercorrelation among the judges is computed, despite the violations 
of basic assumptions involved. If the judgments are not made on a metric 
system, however, this procedure cannot be used anyway. Three alternative 
methods have been proposed by Kogan and Hunt (6), employing the ¢ 
statistic and analysis of variance. These authors remain unsatisfied with 
their methods because they still involve violations of basic assumptions. 
The procedures devised by Guetzkow (5) for estimating reliability in the 
coding of verbal material have been criticized by Tukey, who argues that 
Guetzkow incorrectly assumes that a “unit is either correctly classified 
with 100 per cent certainty, or it is classified at random’’ (8, p. 68). A further 
objection to Guetzkow’s procedure is that it assumes theoretical accuracy 
of classification, which in many cases may be unwarranted. Whether or not 
a particular item is correctly classified may be a meaningful or verifiable 
question only in terms of some further classification or judgment, perhaps 
that of an expert, but still a judgment. Moreover the methods are very 
time-consuming, especially for three or more judges, and require sample 
sizes of 100-150 items. This factor may become of considerable importance 
in cases where several iterations of the reliability estimate are required, 
as in the initial building of a category system, or in the training of judges 
on an established system. 

In the course of establishing a category system for analyzing the results 


*This technique was developed in connection with research at the Counseling Center, 
University of Chicago. This investigation was supported by a research grant (PHS M 903) 
from the National Institute of Mental Health, of the National Institutes of Health, Public 
Health Service. Acknowledgment is made to Dr. Lyle V. Jones, University of Chicago, for 
pager discussions relevant to the technique. Responsibility rests, however, entirely with 
the writer. 
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of a role-playing experiment, the writer had need of a rapid estimate of 
multi-judge reliability. Because of the difficulties involved in extant pro- 
cedures, and further because the units of verbal material were to be classified 
on a non-metric array of categories, a rapid non-parametric technique was 
devised. 


Introduction to the Method 


The general situation concerned is that where a number of persons or 
objects are each to be classified or rated by a number of judges in terms 
of some system. Such a system may be a set of diagnostic classes, role de- 
scriptions, or behavior ratings for persons. It may be any category set, as 
defined by Guetzkow, for verbal material; that is, ‘“a number of classes or 
‘pigeon-holes’ into which the units of qualitative data may be placed” 
(5, p. 48). Where ratings are used, the scale employed may be treated as a 
category set if, and only if, there is no assumption regarding the distribution 
of ratings. Thus a seven-point scale system may be said to have seven 
independent classes, named 1, 2, --- , and 7, provided all classes are equally 
likely to be used. 

The following assumptions underlie the use of the technique to be 
described: (a) the items to be judged are independent; (b) the category 
set, or classification system, is finite; (c) the classes in the system are in- 
dependent; (d) the classes are equiprobable; (e) the judges operate 
independently. 

The basic unit is an agreement between two judges. Between J judges 
there are J(J — 1)/2 possible relations, all or some of which may be either 
agreements or disagreements. There are 10 possible relations between five 
judges. If they all agree, then the maximum number of agreements is achieved, 
namely, 10. If four agree together and one disagrees, then there are only 
4(4 — 1)/2 = 6 agreements. If three agree together and two agree together 
but differ from the first three, then there are 3(3 — 1)/2 + 2(2 — 1)/2 = 4 
agreements. The number of agreements actually achieved by J judges is to 
be compared with the maximum possible number of agreements. The com- 
parison leads to a coefficient indicating the proportion of agreement achieved. 
This coefficient will be called alpha. 


Method of Computing Alpha 


Given a category set containing K classes, judges numbering J, and a 
sample of N items for testing the multi-judge reliability, the set of independent 
judgments for each item is evaluated in terms of how many judges agree and 
disagree. The particular combination of agreements and disagreements is 
called a case. The number of possible cases varies with J and K. For any 
particular J, the possible cases are constant over all values of K > J. For 
K < J, however, not all cases are possible. For example, if J = 5 and K = 4, 
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at least two judges must agree, so that the case of all disagree is impossible. 
Since the judgments are made independently, it does not matter who in 
fact judges first, or who agrees with whom. The concern is solely with the 
number of agreements. Each of the cases for a particular J and K is associated 
with a specific number of agreements. In Table 1, the possible cases for 
J = 5, K > 5, are described in words and in a convenient shorthand. Also 
shown are the numbers of agreements associated with the various cases. 
Table 1 deals with the cases for judgments on one item only. Over a 
sample of N items, alpha is computed as follows: (a) for every item the 
number of agreements achieved is tabulated; (b) these entries are then 
TABLE 1 


The Cases and the Number of Agreements Associated with 


them for J=5, K25 














Cases Agreements 
# Verbal description Shorthand 
1 All five agree. 5:0 10 
2 Four agree; one disagrees. 4:1 6 
3 Three agree; two disagree 


with them, but agree 
together. 3:2 4 
4 Three agree; two disagree 
with them, and also 
between themselves. a Ge | 3 
rs Two agree; two disagree 
with them, but agree 
together; one disagrees 
with all four others. rer e 
6 Two agree; three dis- 
agree with them, and 
between themselves. rbd) ea) 9 | 1 


a All five disagree. 1:1:1:1:1 0 
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summed, giving the total number of agreements between the judges over N 
items; (c) this total is then entered as numerator over a denominator which 
is the maximum possible number of agreements between J judges on K 
classes over N items. The resultant ratio is alpha. A convenient tabular 
procedure is exemplified in Table 2, for which the data were taken from an 
actual experiment using the Thematic Apperception Test. Judges were 
asked to classify stories for the source of activity in the relationship between 
the hero and the other character. There were six possible categories: com- 
pletely from self (C'S); primarily from self (PS); mutuality (M); primarily 
from other (PO); completely from other (CO); and, no evidence (NE). The 
experimenter felt that the assumptions for using alpha were fulfilled. 


Establishing the Probability Value of Alpha 


In Table 2, alpha is seen to be .494. In practice the p-value of alpha 
may be found by entering Table 3 under the appropriate J and K, with 
alpha rounded to two decimal places. In this instance for J = 5, K = 6, 
alpha is found to be significant at better than the .05 level. 

Use of Table 3 presupposes that the assumptions for computing alpha 
are fulfilled. It will be seen that independence of several judgments by any 
one judge is not assumed. On the contrary, it is assumed that several judg- 
ments by one judge are not independent. Such an assumption is based on 
the general principles of behavior. At the extremes, phenomena like response 
fixation warrant this assumption. In general, something like a “learned- 
judgment-disposition’” must be posited. The concern here, therefore, is 
with the mean judging performance of judges. More specifically, the alpha 
computed over a sample of items represents the mean case of agreement 
achieved by the judges as a group. 

The underlying theory of case-probability and the manner in which 
Table 3 was prepared will be briefly described. 

The relative frequency of the various cases for any J and K may be 
expressed as 


number of different ways of obtaining a particular case 


number of different ways of obtaining all possible cases 


Since it is immaterial which category the first judge selects, the algebraic 
denominator of the foregoing expression becomes 


ae (1) 


The several cases for any J and K may be equivalently expressed as 
the presence or absence of partitioning among the judges. The case of perfect 
agreement is equivalently expressed as the absence of partitioning. The 
case 4:1 for J = 5 may be expressed as the partitioning of J into two groups, 
say (a, b, c, d) and (e). The assumption that the K classes are equiprobable 
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TABLE 2 
Example of Computing Alpha on Data Taken From a Thematic Apperception 


Test Experiment with J=5, K=6, N=18 








Items Cases 





520 4:1 3:2 a ee! ese2)) Sted: Fekskstsl 








1 / 
2 / 
3 / 
4 / 
5 / 
6 F 
7 / 
8 / 
9 / 
10 / 
12 / 
12 f 
13 re 
14 i 
15 i 
16 / 
17 / 
18 / 
Sum = 4 5 1 3 1 4 0 








Agreements 


percase = 10 6 4 3 2 1 0 
Product = 40 30 4 9 2 4 0 
Sum products = 89. Maximum agreements = 18 x 10 = 180 


Alpha = 89/180 = .494. 











is equivalent to the assumption that the judges are equally likely to select 
any of the K classes. Under these circumstances, the probability that J 
judges will select exactly r of the possible K classes can be shown to be 
(2, p. 26) 


+ oe oe 
K! (K-pn)! r! 


(2) 
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For the case of perfect agreement r = 1; all judges select the same one 
class. Hence the third term of (2) becomes unity, and the remaining terms 
reduce to 

1 K! K 1 
Koh —-nl) Ke KR @) 


The probability of the case of perfect agreement is seen to be the 
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reciprocal of the total number of different ways in which J judges may behave 
to obtain all possible cases, namely, the reciprocal of (1). 

Cases involving disagreement are those in which judges may be said to 
be partitioned into two or more groups. As an example, the case 2:1:1:1 for 
J = 5 may be expressed as the partitioning of five judges into four groups. 
The probability of this case for J = 5 and K = 5 is given by substituting in 
(2) with r = 4, yielding 

1 | ee 


The right-hand term of (2) has been tabulated by Fisher (2, p. 78), so 
that computation of leading rth differences of the Jth powers of natural 
numbers commencing at zero is unnecessary. Using Fisher’s table for the 
present case 2:1:1:1 for J = 5 and K = 5 the third term is found to be 10, 
and the probability for the case is found to be .38400. 

There are instances in which two or more different cases may each be 
represented as the partitioning of J judges intor groups. For J = 5and K = 5 
the cases 4:1 and 3:2 each partition J into two groups but differ as to size 
of the groups. Since (2) gives the probability for the total partition, there 
is need to separate the respective probabilities for each case. This is done by 
the usual method of proportional parts. 

Following Whitworth (9, pp. 58-9), the number of ways in which 
J = (x + y) judges may be divided into two groups so that one contains x 
and the other y judges is 


(c + y)!/(zly!). (4) 

If J is az: even number, the situation may arise where x = y. For J = 4 
the case 2:2 would be one instance of this. There are six possible partitions 
of the form (x) (y): 

(ab)(cd); — (ac)(bd); (ad) (be); 
(cd)(ab); — (bd)(ac); (be) (ad). 

Except for the assignment to x or y, the second three partitions duplicate 
the first three. Since the x or y assignment is irrelevant for the present 
purposes, when x = y a correction factor must be introduced to divide out 
duplicate partitions. This may be achieved by taking the factorial of the 
number of numerically equivalent groups in the partition. Let that number 
be e. Then, for J = 4 a partition of 2:2 hase = 2. For J = 5 a partition of 
2:1:1:1 has e = 3. The general expression required for dividing J = (x + y) 
judges into two groups is 


TiV/(alyle!). (5) 


Where e = 0, e! = 1, in which case (4) is unchanged. 
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Since the right-hand member of (2) gives the number of ways of dividing 
J judges into r groups of all possible sizes, and since (5) gives the number of 
ways of dividing J judges into two groups of any particular size, the sum- 
mation of (5) over all possible group sizes for J = x + yandr = 2 will yield 
the identity 
> J/(a; lye!) = A707 /2!. (6) 
t=1 
For J = 5 and K = 5 the cases 4:1 and 3:2 both entail the partitioning 
of J into r = 2 groups. In both cases e = 0. Substituting in the left-hand side 
of (6) yields 5 + 10. Entering Fisher’s table for A’0”’/r! with r = 2 and 
J = 5, the result is 15. From (2) the total probability associated with the 


two cases is .09600. The probabilities associated with each case are therefore 
(4:1): .09600 X 5/15 = .03200 


and 
(32): .09600 * 10/15 = .06400. 


Where J judges are to be partitioned into r = 3 groups, (6) becomes 


> JV/(a,lyite:te:) = A°07/3!. (7) 
i=l 
For J = 5and K = 5 the cases 3:1:1 and 2:2:1 both entail the partitioning 
of J into r = 3 groups. In both cases e = 2. The probability for the total 
partition is given by (2) as .48000. Substituting in (7) yields 10 + 15 = 25. 
Hence the probabilities associated with each case are 


(3:1:1): .48000 & 10/25 = .19200 


and 
(2241): .48000 K 15/25 = .28800. 


From (2), in conjunction with (6) and (7), or the extensions of the 
latter for any r, the exact probabilities of all cases for any J and K may be 
derived. The set of probabilities for J = 5 and K = 5 are shown in Table 4. 

As may be seen from Table 4, the p-values are associated with particular 
cases. The alpha coefficients for these cases are exact for a set of judgments 
on one item. In practice, however, an alpha coefficient represents the mean 
case of agreement achieved over N items. Under these conditions, alpha 
may vary continuously between 0.00 and 1.00. The probabilities associated 
with values of alpha intermediate between two adjacent cases may be assumed 
to lie correspondingly between the two associated probability values. Thus, 
for J = 5and K = 5an alpha of .57 would lie between cases 2 and 3, namely, 
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between .60 and .40 (cf. Table 4). The p-value of that alpha would lie between 
the p-values associated with cases 2 and 3, namely, between .03200 and 
.06400. An estimate of the p-value for alpha may be made by linear inter- 
polation. For alpha = .57, in the present example, p = .03200 + 3(.06400 — 
.03200)/20 = .03680. 

Table 3, giving the values of alpha required for various levels of prob- 


TABLE 4 


The Cases, Alpha Coefficients, and Probability Values for J=5, K=5 











# Shorthand Agreements Alpha P-values 
1 5:0 10 1.00 .00160 
2 4:1 6 .60 .03200 
3 3:2 4 .40 .06400 
4 sti 3 .30 .19200 
= 2:2:1 2 .20 .28800 
6 Poe) Use be | 1 .10 .38400 
7 Pricksacd 0 .00 .03840 








ability for J = 3 to 5 and K = 4 to 16, was prepared on the basis of the 
argument in the foregoing paragraphs. For a particular value of J, the 
probabilities associated with each possible case over the values of K from 
4 to 16 were computed directly and set out in tabular form. The column 
of p-values in Table 4 is typical of this preliminary tabulation. In the pre- 
liminary tables, thirteen such columns appeared, one for each value of K. 
Values of alpha intermediate between adjacent cases were inserted into the 
preliminary table, with successive intervals of .01. Probabilities for the 
intermediate values of alpha were computed by linear interpolation between 
the p-values of adjacent cases. The interpolations were taken to five decimal 
places. From the preliminary tables, the values of alpha having p-values at 
or slightly less than .10, .05, .025, etc., were abstracted to form Table 3. 


Sample Size for Items 


The question of sample size is important in two ways. First, the nature 
of much work in psychology precludes the possibility of obtaining large 
samples, particularly for pilot reliability studies. Most experimenters cannot 
easily obtain 150 Rorschach protocols on which to try out a new category 
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system. Second, the size of samples is important for the methodological 
problems associated with judges’ discrimination among the classes of the 
system. It will be shown below that the minimum sample size required is 
equal to three times the number of classes in the system, that is, N (min.) = 
3K. This minimum is required by the method of obtaining an estimate of 
judges’ discrimination. It also serves, however, as a useful lower limit of 
sample size for estimating alpha. 


Sample Size and Judge Discrimination 


There are four possible extreme situations in the relations between 
multi-judge agreement and multi-judge discrimination. These are: maximum 
agreement and maximum discrimination; maximum agreement and minimum 
discrimination; minimum agreement and maximum discrimination; minimum 
agreement and minimum discrimination. The first of these situations is 
exemplified, for J = 3, K = 4, and N = 12, by the judges achieving perfect 
agreement on all items with three items placed in K1, three in K2, three 
in K3, and three in K4. This situation gives rise to a theorem which states 
that for a reasonable estimate of discrimination the sample must number 
a multiple of K. To prove this theorem note that the case of perfect agree- 
ment can be equipossible among the K classes if, and only if, the sum of 
possible cases (that is, the number of items available for classification in 
any K) is the same for all K. In the technique described below, it will be 
seen that, for maximum discrimination, chi-square is always zero. This is 
possible under all conditions only if the number of items is equal to a multiple 
of K. 

The situation of maximum agreement and minimum discrimination 
would be exemplified for J = 3, K = 4, and N = 12, by the judges agreeing 
perfectly on all items, but with all items placed only in one class, say K1. 
In other words, though the judges agree perfectly, it may be due to the 
fact that all the items look the same to them—they all look like K1. 

In the attempt to develop a measure of multi-judge discrimination, the 
methods of measuring the amount of transmitted information proposed by 
Garner and Hake (4) were examined. It was not found possible to use any 
of these methods. But the comparability between these methods and those 
of analysis of variance, which is pointed out by the authors, suggested the 
possibility of using rank analysis of variance. The technique to be described 
adapts analysis of variance by ranks, as developed by Pitman (7), Friedman 
(3), and Wilcoxon (9) to the kind of data with which the present problem of 
discrimination is concerned. In this technique good multi-judge discrimina- 
tion is indicated by a non-significant value of chi-square. Poor discrimination 
is indicated by a significant value of chi-square. For the situation of minimum 
agreement and minimum discrimination, where each judge places all items 
in one class and no two judges use the same class, it is possible for a non- 
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significant chi-square to result from certain J and K values unless the number 
of items (or replications) is greater than 2K. Since, from the argument above, 
the number of items in the sample must be a multiple of K, the minimum 
number of items in a sample must therefore be equal to 3K. 


Estimating the Multi-Judge Discrimination 


The method of estimating the multi-judge discrimination is the method 
of analysis of variance by ranks. A convenient exposition of the method 
and formula for computing chi-square, together with a reproduction of the 
chi-square chart developed by Bliss (1), is presented by Wilcoxon (9). The 
method of preparing the data for this analysis in regard to judge discrimi- 
nation is as follows: 

A table is prepared having K X N cells. The classes of the K system 
are considered as treatments. The items of the sample are considered as 
replications. On each item the number of judges placing the item in a 
particular category or class is entered in the appropriate cell for that item 
row and class column. The entries are made for all items. Each row is then 
ranked, with the highest rank being assigned to the cell (class) having the 
largest number of judges, and the lowest rank being assigned to the cell 
having the least number of judges. Ties are scored in the usual way. The 
columns are then summed, the sums are squared, and the squares are summed. 
This last figure is entered in the equation for chi-square. 

An example of the procedure is given in Table 5. The data for this example 
are the same as those for Table 2. 

From Table 5 it is seen that the chi-square has a p-value of .183. This 
value is clearly non-significant and adequately represents the distribution 
of class selections by the judges. There is little or no concentration on one 
or a few classes, and hence the discrimination is good. In other instances, 
some concentration may appear and be manifested in a value of chi-square 
approaching significance. In such an instance, where for example the p-value 
was .075, it would be a matter for the experimenter to decide whether or 
not the discrimination is good enough, in terms of the kind of items presented 
and other factors. 

In general, p-values between .10 and .06 will require careful consideration 
in relation to the alpha achieved by the judges for agreement. Values of 
chi-square which have a p-value greater than .10 may be taken as indicating 
good discrimination on the part of the judges. Values of chi-square which 
have a p-value of .05 or less may be taken to represent poor discrimination. 
In such circumstances it may be possible to combine two or more classes, 
thus raising discrimination efficiency on a reduced number of classes. Such 
a course would require recomputation of alpha on the revised K system. 

Since alpha varies independently of judge discrimination, a high value 
of alpha should always be checked for its meaningfulness in terms of eaether 
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TABLE 5 
Example of Estimating Multi-Judge Discrimination on Data Taken froma 


Thematic Apperception Test Experiment with J=5, K=6, N=18 























Item Classes 
CS Rank PS Rank MRank PO Rank CO Rank NE Rank 

1 0 1+ l 4 1 4 2 6 0 1+ 1 4 

2 3 6 2 5 0 2+ 0 2+ 0 2+ 0 2+ 

3 0 2 0 4 3 6 1 4+ 1 t+ 0 Zz 
4 0 2+ 4 6 0 2+ 0 2+ 1 5 0 2+ 

5 0 3 0 3 0 3 : 9 0 3 5 + 

6 1 - 2 5+ 0 2 0 2 0 z 2 5+ 
7 1 4 Z 6 1 4 1 4 0 1+ 0 1+ 
8 0 3 0 3 0 3 5 6 0 3 0 3 

9 1 5 4 6 O 2+ 0 2+ 0 2+ 0 2+ 
10 0 2+ 0 2+ O 2+ l 5 4 6 0 2+ 
11 l - 1 a i 4 Z 6 0 1+ 0 1+ 
12 1 5 + 6 O 2+ 0 2+ 0 2+ 0 2+ 
13 ~ 6 1 5 0 2+ 0 2+ 0 2+ 0 2+ 
14 0 2 l 4+ 1 4+ 3 6 0 Z 0 Z 
15 0 3 0 3 0 3 0 3 0 3 5 6 
16 0 4 0 a 3 6 1 4+ 1 4+ 0 2 
17 1 = l G 1 4 2 6 0 1+ 0 1+ 
18 0 2+ = 6 O 2+ 0 2+ l 5 0 2+ 
Total 62.0 775 61.0 71.0 54.0 52.5 
Squares 3844.0 6006.25 3721.0 5041.0 2916.0 2756.25 





Sum of squares = 24284.50 


2 ad 
oe = 12 x sum ( rank totals ) - 3n(p+l) 
np(pt+1) 
= iz x 24284.50 - 3x18x7 = 7.468 
18 x6x7 


as 2 
Degrees of freedom = p-1l = 5. Probability of X r = 7.468 = .183. 
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or not the judges showed good discrimination also. If they did not, it is 
possible, though not necessary, that their degree of agreement was related 
to their poor discrimination. 
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A STUDY OF SPEED FACTORS IN TESTS 
AND ACADEMIC GRADES* 


Freperic M. Lorp 
EDUCATIONAL TESTING SERVICE 


Speeded and unspeeded tests of vocabulary, spatial relations, and arith- 
metic reasoning were factorially analyzed, together with certain reference 
tests and academic grades. Lawley’s maximum likelihood method was used, 
the computations being carried out on the Whirlwind electronic computer. 
Four different speed factors were isolated, together with a second-order general 
speed factor. Consistent small positive correlations between the academic 
grades and the speed factors were found. 


The speed with which an examinee responds to the items in a test 
frequently affects his score. Almost all achievement and aptitude tests are to 
some extent measures of “speed.” Tests for factor analyses are frequently 
speeded because many tests must be given in a limited time. 

Much remains to be learned about “‘speed,”’ in spite of the fact that it is 
commonly an element in test scores. Is speed on cognitive tests a unitary 
trait? Or are there different kinds of speed for different kinds of tasks? If 
so, how highly correlated are these different kinds of speed? How highly 
correlated are speed and level on the same task? How do various criteria 
relate to speed, and how speeded should tests to predict these criteria be? 
These are the questions which the present study attacks. 


Some Previous Results 


Factor analytic studies have often isolated a ‘‘perceptual-speed factor,”’ 
usually measured by tests requiring simple, rapid visual discriminations. 
“This factor is characterized by the task of (quickly) finding in a mass of 
distracting material a given configuration which is borne in mind during the 
search” (6). Any speed test composed of very easy items is likely to have a 
loading on this factor. A more recent publication (7) breaks down “perceptual 
speed” into at least two factors, ‘speed of symbol discrimination” and ‘form 


*The writer is indebted to Dr. John French, to Dr. David Saunders, and especially to 
Dr. Ledyard R Tucker for helpful suggestions and theoretical advice throughout the course 
of this study. The active cooperation of Dr. William Shields, Educational Advisor, and of 
many others at the United States Naval Academy at Annapolis has been invaluable. The 
author is very grateful to Dr. P. Youtz and Dr. C. W. Adams for the opportunity to use 
Whirlwind I, a high-speed computer sponsored by the Office of Naval Research, and to 
Dr. H. Denman for help is programming and in putting the program on the computer. 
He also wishes to express his deep appreciation to Dr. Hubert Brogden and Miss Bertha 
Harper of The Adjutant General’s Office for the opportunity to use their matrix rotator and 
for helpful guidance in its operation. 
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perception,’ the former relating to familiar symbols, the latter to unfamiliar 
figures. 

Other factors related to speed include finger dexterity, fluency of ex- 
pression, ideational fluency, reaction time, speed of association, speed of 
judgment, tapping, word fluency (6). Speed of closure and motor speed are 
included in (7). Rimoldi (20) finds a “speed of judgment,” a “speed of 
cognition,” and a second-order “personal tempo”’ factor; but his subjects, 
like those in many earlier studies, were to work at a “natural, congenial’ 
speed rather than at the maximal speed required by most tests. 

Since many tests in factor analytic studies are speeded, many of the 
factors are speed factors, although not always so described. An example is 
the “number” factor, which is commonly measured by highly speeded tests 
of addition, subtraction, multiplication, and division. This factor will here 
be referred to as the number-speed factor. 

In spite of the presence of both speeded and unspeeded tests in most 
factor analysis batteries, a general intellectual-speed factor has not routinely 
been found. Studies designed to investigate the existence of both general 
and specific speed factors in ordinary aptitude test batteries have been few 
and have yielded conflicting evidence (3, 4, 17, 18, 21, 22). 

For further consideration of “speed factors,” the reader is referred to 
(24, pp. 80-85) and to the 33 references in (8). 


Data for the Present Study 


The Subjects 


All measures in this study were obtained on 649 students in the entering 
class at the United States Naval Academy at Annapolis. This large number of 
cases was used to obtain clearly interpretable results. 


The Tests 


The study centers around tests of the verbal factor, of spatial ability, 
and of arithmetic reasoning, because of the widespread use of tests in these 
areas. 

In each area, seven tests were administered. One was the regular admis- 
sions examination, denoted by (A), which is only slightly speeded. The 
remaining six were short experimental tests administered at the beginning of 
the school year. These were parallel in content, but different in degree of 
speededness. Two were “‘level”’ tests, denoted by (L), involving virtually no 
speed. One was moderately speeded (M). The remaining three tests were 
highly speeded (S). In order to confound practice effect insofar as possible, 
the tests were administered in scrambled order, as follows: LSMSLS. The 
examinee was told the degree of speededness that would be required. 
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Six reference-factor tests (number, perceptual speed, word fluency) also 
were administered. These are designated by (R). 
A more complete description of all the tests follows. 


1. Word Fluency (R). The examinee writes as many words and their 
opposites as he can in four minutes. This test was included so as to determine 
its relation to the verbal factor and to the verbal-speed factor, if such were 


found. 


2. Verbal (A). This test contained both word-analogies and “double- 
definitions” items. The latter item type is essentially a sentence with two 
missing words to be selected from alternative pairs of words provided, thus 
producing a simple definition of one of the missing words. 


3, 4. Vocabulary (L). These tests require finding among the choices a 
word opposite in meaning to the given key word. Also 5. Vocabulary (M) and 
6, 7, 8. Vocabulary (S). [9. Vocabulary (LIA) is merely the “last-item- 
attempted score” on test 7, to be discussed below.] 


10. Spatial Relations (A) contained block-counting and “identical- 
blocks” items. The latter require the examinee to indicate which of five 
drawings represents a key block drawn from a different angle. 


11, 12. Intersections (L). These tests require the examinee to visualize 
the two-dimensional outline of the intersection of a solid geometric object 
cut by a plane. Also 13. Intersections (M) and 14, 15, 16. Intersections (S). 
[17. Intersections (LIA) is merely the “last-item-attempted score” on test 15.] 


18. Mathematics (A) is composed of arithmetic reasoning, algebra, and 
geometry items. 


19, 20. Arithmetic Reasoning (L) consist entirely of the usual arithmetic- 
reasoning items. Also 21. Arithmetic Reasoning (M) and 22, 23, 24. Arithmetic 
Reasoning (S). [25. Arithmetic Reasoning (LIA) is merely the “last-item- 
attempted score” on test 23.] 


26, 27. Number Speed (R) are highly speeded reference tests for the 
number-speed factor. 26 consists of simple addition and division, 27 of easy 
subtraction and multiplication. 


28, 29, 30. Perceptual Speed (R) are reference tests for the perceptual- 
speed factor. 28. Cancellation requires the examinee to cross out as many 
letter A’s in a paragraph as he can in two minutes. 29. Picture Discrimination 
requires him to indicate which of three very sketchily drawn faces is different 
from the other two. 30. Number Checking requires him to indicate whether 
two multi-digit numbers are the same or different. 
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TABLE 1 


Background Information and Data 
on Speededness for the "Experimental" Tests 





Number Test- Items Per cent* of 


Tests sell of ing per Examinees 











anes Items Time Hour Finishing 

3,4 Vocabulary L 15 T 129 97 
5 A M 30 5 360 git 
6,7,8 " Ss 75 5 900 2 
23.32 Intersections L 15 20 45 98 
13 " M 20 12 100 (&; 
14,15,16 Ss 35 9 2535 a. 
19,20 Arithmetic Reasoning i: 10 20 30 roy 
21 _ sf M 15 15 60 50 

22 23,24 ’ m S 30 10 180 4 





* 
The mean of two values in the case of the level tests, of three in the 
case of the speed tests. 


Table 1 summarizes the background information about the ‘“experi- 
mental’ tests and shows the proportion of examinees who answered the last 
item in each test. The speeded tests were in fact very highly speeded. There 
is reason to believe that many or all of the examinees who answered the last 
item of the speeded tests skipped many items or responded at random. 


Scoring 

The three admissions tests are composed of multiple-choice items 
having five (in a few cases, eight) alternative responses. The score obtained 
for each test was the number of items answered correctly. 

The eighteen experimental vocabulary, intersections, and arithmetic- 
reasoning tests were all composed of five-choice items and were scored number- 
right minus one-fourth-number-wrong. This “correction for guessing’? was 
made in order that any speed factor that might be found should not be open 
to the challenge that it was merely a “‘willingness-to-guess-wildly”’ factor. It 
would have been wrong to include both corrected and uncorrected scores 
on the same test in a straightforward factor analysis, because of their ex- 
perimental dependence. Some further investigation of the effect of the correc- 
tion for guessing was nevertheless planned. For this purpose, number-right 
(NR) scores were obtained for tests 7, 15, and 23, these new scores being 
designated as variables 37, 38, and 39. 

The score on each of the six reference tests was the number of right 
answers, because this is the usual method for scoring these tests. Scored in 
any other way, they might no longer represent the same reference factors. 
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In addition to the regular score, a “last-item-attempted score’? (LIA) 
for one speeded test in each of the three areas gives a crude measure of rate- 
of-work. Inclusion of such scores in the present study was considered desirable, 
although in general the study is primarily concerned with the type of scores 
normally used in work with aptitude tests. The statistical method used to 
deal with the experimental dependence of these scores and of the NR scores 
on the other scores obtained from the same tests will be outlined later. 


School Grades 


During their first year, all students at Annapolis normally receive 
grades in each of the following: 


31. English Composition and Literature. 

32. Foreign Language. (Each student selects one of several available.) 
33. Engineering Drawing and Descriptive Geometry. 

34. Chemistry. 


35. Mathematics. (Plane trigonometry, college algebra, plane and solid 
analytic geometry, and calculus.) 


36. Conduct. (The method by which these grades are assigned need not 
concern us, since no factor loadings of interest were found for this variable.) 

In the present study, each numerical grade is averaged over two 
semesters. Each semester course grade represents a combination of day-to-day 
course work and final-examination performance weighted in the ratio of three 
to two. The instructors could not have had knowledge of the test scores, with 
the possible exception of the three admissions tests. 

The final examinations were virtually unspeeded, almost every student 
finishing. The day-to-day work in class varied but was not in general com- 
pulsorily speeded. It is not known whether students felt pressed for time 
while doing their homework assignments. 


Statistical Analysis 


Normalizing 

All variables were normalized before product-moment correlations were 
computed. This was considered desirable since otherwise any speed factors 
that might be found might conceivably have been attributable to certain 
common features in the shapes of the score distributions of the speeded tests 
(e.g., skewness), rather than to a real speed factor. 


The Correlations 


The use of product-moment correlations is required by the significance 
tests to be described later. The correlation matrix is presented as Table 2. 
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TABLE 2 
Matrix of Intercorrelations (decimal point omitted) 

oe Ss & § 6 FF SBS 9 od 22 22 25 8h os 62S 6 ae ee 
1 --- 264 200 262 249 281 341 321 314 | 056 111 100 064 136 O80 O90 122 |181 158 
2} 264 --- 720 720 790 666 732 679 388 | 178 17% 135 165 153 138 126 063 |413 
3 200 720 --- 669 706 620 693 641 328 | 134 150 055 102 119 093 079 087 | 287 % 
4l 262 720 669 --- 690 648 697 650 343 | 126 181 144 184 171 128 121 O54 | 310 3) 
5| 249 790 706 690 --- 660 745 700 393 | 136 111 082 115 093 082 075 O86 | 326 2% 
6 281 666 620 648 660 --- 775 757 531 | 176 162 137 176 188 174 161 140 | 323 of 
7] 341 732 693 697 7h5 775 --- 855 (671)| 135 122 061 103 127 107 100 130 | 337 2m 
8 321 679 641 650 700 757 855 --- 609 | 138 O94 067 117 135 10k 114 126 | 334 a7 
9 314 388 3208 343 393 531 (671) 609 --- | 078 060 026 031 127 086 109 291 |290 1g 
10} 056 178 134 126 138 176 135 138 078 | --- 480 5Shl 548 543 534 553 236 | 349 2% 
11/112 174 150 181 111 162 122 094 060 | 480 --- 722 714 69% 701 69 270 | 312 33 
12 100 135 055 sh 082 137 061 067 026 | 541 722 --- 767 714 738 750 246 | 255 295 
13 064 165 102 184 115 176 103 117 O31 | 548 714 767 --- 748 760 770 262 | 274 33 
14 136 155 119 171 093 188 127 135 127 | 543 696 714 748 --- 796 788 428 | 266 3% 
15 080 138 093 128 O82 17% 107 104% 086 | 534 701 738 760 796 --- 833 (481) 253 32 
16 090 126 079 121 075 161 100 114 109 | 555 696 750 770 788 833 --- hg | 295 335 
17 122 063 O87 O54 O46 140 130 126 291 | 236 270 246 262 428 (481) Wkg --- | 132 105 
18 181 413 287 310 326 323 337 334 290 | 349 312 255 274 266 253 295 132 | --- 555 
19} 156 341 300 316 265 276 291 273 192 | 290 337 295 337 322 312 335 105 | 555 -= 
20 165 324 251 311 258 290 270 288 176 | 271 357 302 362 300 286 314 090 | 59% 5% 
21] 154 363 285 339 299 323 313 348 208 | 331 295 297 344+ 309 289 324 121 | 638 606 
22} 183 335 279 255 300 357 353 364 303 | 309 289 280 334 310 291 361 17% | 628 5A 
23| 17% 329 275 286 297 333 350 347 282 | 321 337 280 338 345 338 370 192 | 602 5% 
24 216 364 276 301 314 380 364 388 316 | 275 257 230 252 266 254 307 152 | 596 5lb 
25} 233 195 160 147 194% 226 294 290 455 |] 171 124 2 105 192 177 19% 394 | 355 2 
26 258 O51 O47 013 086 133 196 205 339 | 027 -O46 -068 -092 -018 -028 ‘00k 102 | 260 155 
27 200 050 O4 013 088 165 178 202 358 | 058 -018 -022 -045 012 -028 037 070 |351 1% 
28 223 058 O52 O43 O97 211 223 233 316 | 068 103 076 065 114 O72 123 150 |160 10 
29} 212 121 08% 085 100 245 254 250 354°] 247 215 243 231 292 283 277 231 | 194 115 
30 122 -062 -038 -023 -056 093 105 112 2h9 |-016 -028 -049 -080 -048 022 005 052 | 113 0% 
31; 303 560 497 568 537 519 590 540 373 | 120 143 093 102 073 037 058 003 | 302 509 
32 184 210 172 205 192 186 220 226 204 | 082 136 125 100 110 100 106 019 |177 12% 
33 170 184 O84 186 138 247 192 221 182 | 476 5u6 582 605 574 568 584 225 | 308 355 
34 249 230 172 238 19% 270 258 248 228 | 238 321 310 318 327 292 320 141 | 376 42h 
35 207 156 119 145 128 213 211 210 258 | 218 328 288 300 308 283 326 143 | 435 4u 
36] -082 -051 -106 -017 -064 -034 -038 -019 -020 | 055 019 025 056 023 023 037 -054 | 059 072 
37} 346 707 665 673 72k 770 (986) 856 (745)] 124 111 049 087 127 102 099 171 | 334 277 
38] 097 135 099 124 082 181 120 118 139 | 5e2 678 698 722 790 (977) 825 (625) | 256 295 
39} 216 325 257 271 295 341 362 362 368 | 326 329 272 317 352 340 366 297 | 597 SOT 
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Matrix of Intercorrelations (decimal point omitted) 
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20 21 22025 eh 25 | 2 6bT | 28 029 HO] BT 2 OS Oh O85 86 | 378389 
| 165 154 183 174 216 233 | 258 200 | 223 212 122 | 303 184 170 2h9 207 -08e 346 097 216] 1 
34 363 335 329 364 195 | 051 050 | 058 Jal -062 | 560 210 18% 230 156 -051 | 707 135° e512 
21 285 279 275 276 160 | O47 obk | 052 084 -038 | 497 172 08% 172 119 -106 | 665 099 257] 3 
jl 339 255 286 301 147 | 013 013 | 043 085 -023 | 568 205 186 238 145 -017 | 673 lek e771] & 
258 299 300 297 314 194-| 086 088 | 097 100 -056 | 537 192 138 196 128 -06h 724 082 295] 5 
290 325 357 333 380 226 | 133 165 | 211 245 093 | 519 186 2h7 270 213 -034 | 770 181 3h1] 6 
270 313 353 350 364 294 | 196 178 | 223 254 105 | 590 220 192 258 211 -038 (98) 120 362] 7 
288 348 384 347 388 290 | 205 202 | 233 250 112 | 540 226 221 248 210 -019 | 856 118 362] 8 
176 208 303 282 316 455 | 339 358 | 316 354 249 | 373 204 182 208 258 -020 |(745) 139 368] 9 
afl 331 309 321 275 171 | 027 058 | 068 247 -016 | 120 082 476 238 218 055 | 124 522 326] 1o 
357 295 289 337 257 124 |-046 -018 | 103 215 -028 | 143 136 546 321 328 O19 {111 678 329] 11 
W2 297 280 280 230 112 |-068 -022 | 076 243 -ok9 | 093 125 582 310 288 025 049 698 272] 12 
H2 34 334 338 252 105 |-092 -045 | 065 231 -080 | 102 100 605 318 300 056 087 722 317] 13 
30 309 310 345 266 192 |-018 012 | 114 292 -048 | 073 110 57% 327 308 023 127 790 352] 1b 
286 289 291 338 254 177 |-028 -028 | 072 283 -022 | 037 100 568 292 283 023 | 102 (977) 340 15 
Dh 32h 361 370 307 19% | 00% 037 | 123 277 005 | 058 106 58% 320 326 037 | 099 825 366] 16 
09 121 174 192 152 394 | 102 O70 | 150 231 O52 | 003 019 225 hl 143 -054 | 171 (625) 297] 17 
5% 638 628 602 59% 355 | 260 351 | 160 194 113 | 302 177 308 376 435 059 | 334 256 597| 18 
538 606 551 532 514 211 | 155 193 | 100 115 026 | 309 192 355 oh 412 O72 | 277 295 507] 19 
so- 560 5h 548 534 229 | 207 276 | 088 129 039 | 281 228 328 429 475 029 | 258 270 518] 20 
0 --- 632 578 571 284 | 187 310 | 147 155 090 | 318 177 313 378 376 Ok7 | 30h 278 554] 22 
Hh 632 --- 632 639 393 | 320 409 | 178 195 120 | 278 190 338 380 416 083 | 351 293 629] 22 
8 578 632 --- 610 (454)] 241 321 [178 2he 148 | 272 225 350 420 430 095 | 347 332 (949) 23 
34 5TL 639 610 --- 349 | 329 407 | 175 199 137 | 300 213 302 364 400 O08 360 252 596] 2h 
229 284 395 (454) 349 --- | 322 304 | 245 280 187 | 178 157 183 280 300 -012 | 340 237 (670) 25 
207 187 320 241 329 322 | --- 646 | 347 316 443 | 160 222 O81 236 342 007 | 227 000 286] 26 
2% 310 hog 321 407 304 | 646 --- 322 264 464 | 167 198 090 234 359 067 | 212 -013 348 27 
088 147 178 178 175 2h5 | 347 322 | --- 370 428 | 152 175 207 194 259 -033 | 248 O94 218] 28 
29 155 195 242 199 280 | 316 264 | 370 --- 392 | 109 08% 331 201 216 021 275 291 290] 29 
039 090 120 148 137 187 | 443 464 | 408 392 --- | 082 127 123 165 287 O47 | 135 -009 170] 30 
21 318 278 272 300 178 | 160 167 | 152 109 082 | --- 467 300 448 386 081 | 584 036 277] 31 
8 177 190 225 213 157 | 222 198 | 175 08% 127 | 467 --- 302 497 504 147 | 230 09% 230] 32 
38 313 338 350 302 183 | 081 090 | 207 331 123 | 300 302 --- 604 570 177 | 19% SkO 351] 33 
W9 378 380 420 364 280 | 236 234 | 194 201 165 | 448 497 Goh --- 806 125 259 283 47] 34 
45 376 416 430 400 300 | 342 359 | 259 216 287 | 386 504 570 806 --- 134 | 221 275 438] 35 
&) 0k7 083 095 O84 -012 | 007 067 |-033 021 O47 | 081 147 177 125 134 --- [oko 007 082] 36 
28 30h 351 347 360 340 | 227 212 | 248 275 135 | 58h 230 194 259 201 -oho |}--- 126 381] 37 
2 278 293 3532 252 237 | 000 -013 | 094 291 -009 | 036 ook SkO 283 275 007 | 126 --- 359] 38 
3 554 __ 629 (949) 596 (670) | 286 348 | 218 290 170 | 277 230 351 427 438 082 | 381 359 ---| 39 
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Variables 9, 17, 25, 37, 38, and 39, which were not used in the factoring, are 
experimentally dependent on variables 7, 15, and 23. The consequent 
spuriously high correlations are placed in parentheses in the table. 


Lawley’s Maximum Likelihood Method of Factor Analysis 


Factors were extracted by Lawley’s maximum likelihood method. Since 
this important method has not often been mentioned in this country and 
since it (or a modification) is likely to become widely used in the near future, 
some references are listed. The basic development was given in (12). Ex- 
tensions and further developments appeared in (13, 15, 1, 2, and 23). [A 
maximum likelihood Method II, avoiding the assumption of multivariate 
normality, was developed in (13). This second method will not be considered 
here, since the usual optimum properties of maximum likelihood estimates 
do not appear to hold (1, 11). Whittle (25) derived a relatively simple solution 
for a similar situation in the special case where the variables are of known 
reliability.] Henrysson (10) reported an empirical sampling study supporting 
Lawley’s test of significance. Bartlett (1, 2) gave a significance test that is 
superior to Lawley’s whenever the number of examinees is not large compared 
to the number of variables. Some recent papers (in English) by Bartlett, 
Lawley, and others appeared in (26). Very recently Rao (19) discussed the 
basic differences between Hotelling’s principal-component analysis and 
common-factor analysis and described further developments related to those 
of Lawley. 

Lawley’s method and Thurstone’s centroid method are both concerned 
with estimating common-factor loadings, specific-factor variance being 
systematically set aside. Certain characteristics of the maximum likelihood 
method are: 

1. The number of common factors is tentatively hypothesized in advance. 

2. The procedure in effect determines the population correlation matrix, having the 
hypothesized rank, for which the likelihood of occurrence of the observed sample in the 
course of random sampling is a maximum. The matrix of factor loadings exactly reproducing 
this matrix of population correlation coefficients is the basic result obtained by the maximum 
likelihood method. The result is obtained by iterative procedures. 

3. The usual matrix of residuals is computed; a rigorous large-sample significance test 
is made to determine whether or not the residuals may plausibly be attributed solely to 
sampling fluctuations in the correlation coefficients. 

4. If the residuals are statistically significant, the research worker repeats the fore- 
going process, starting with different tentative hypotheses as to the number of common 
factors required to explain the data, until he is ready finally to accept one of these hy- 
potheses. 

5. The usual problem of estimating the communalities ceases to be a serious cause for 
concern, since the maximum likelihood estimates of the communalities are one of the 
outcomes of the procedure. 


The practical application of the maximum likelihood method is discussed 
in (5, 16, 14). Until now, the method has not been applied to other than very 
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small correlation matrices because of the large amount of computations 
required. From a computational point of view, Lawley’s method can be de- 
scribed essentially as equivalent to the task of finding the latent roots and 
vectors of a modified correlation matrix, the correlations being modified by 
dividing them by a simple function of the unknown latent vectors. 


Extraction of Factors 


The application of Lawley’s method to the actual data was carried out 
on Whirlwind I, a high-speed digital electronic computer. The computing 
program was written by the author with a view to minimizing the use of 
computer time in case convergence should require hundreds of iterations. A 
single iteration with this program required roughly 12 seconds, the time 
varying somewhat with the number (m) of factors hypothesized. 

The original hypothesis of the author suggested that m should be at least 
9 for the 33-variable matrix analyzed. However, application of Lawley’s 
method to the initial set of trial values for the factor loadings failed because 
the computations generated imaginary numbers. Extremely close initial 
approximations to the solution were necessary whenever m was at all large. 

The problem was dealt with as follows. Computations were first carried 
out with m = 4. Initial trial values of the factor loadings were arbitrary 
except that (a) calculated loadings on the first centroid factor were used for 
the first column, (b) the remaining trial values were selected so that the sum 
of squares of the trial values for any one variable was equal to the highest 
correlation with that variable. The iterations were successfully completed 
for m = 4. The resulting estimates of the factor loadings were used as the 
first four columns of the trial values needed to start the iterations with 
m = 5; the fifth column of these trial values was set up in accordance with 
informed guesses based on the matrix of residuals. The trial values for m = 6 
were set up in the same way from the results obtained with m = 5, and so 
forth. In every case after m = 4, the initial trial values proved to be close 
approximations to the corresponding final factor loadings. No further 
imaginary numbers were encountered. 

Convergence of the iterative process was rapid, as shown by the second 
column of Table 3. The criterion used for stopping iteration required that the 
largest discrepancy between the corresponding factor loadings produced by 
two successive iterations should remain less than .002 throughout ten 
successive iterations. 

The matrix of residuals obtained with each value of m was tested for 
significance by means of Lawley’s chi-square test. Information about the 
progress of the computations and about the chi-square significance tests is 
given in Table 3. Although arguments could be advanced for extracting an 
eleventh factor, it was decided to stop with ten. 

The orthogonal unrotated matrix of the maximum likelihood estimates 
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TABLE 3 


Tests of Significance and Other Information According 
to the Number ( m ) of Factors Hypothesized 





Number of Number of Degrees of 





Sum of Chi-Square Probability 
ee “_— inner Latent Calculated — Level for 
Rypothesiz es or Roots from Residuals ned Chi-Square 

(m ) Convergence Chi-Square 
4 35 61 2,605 4o2 < .01 
5 22 69 893 373 < 301 
6 23 75 662 345 < 201 
7 28 78 530 318 < 401 
8 26 80 436 292 < 01 
9 25 83 357 267 < 20): 
10 28 88 284 2h3 .O7 





of the factor loadings is given in Table 4. The communality for each test and 
the latent root for each factor are also shown. Each latent root is the weighted 
sum of the squares of the loadings on the corresponding factor, the weight 
for each test being the reciprocal of its uniqueness. 


Estimation of Unrotated Factor Loadings for Experimentally Dependent 
Variables 


The six variables in Table 4 with loadings enclosed in parentheses were 
not included in the 33-variable correlation matrix from which the factors 
were extracted. These loadings were estimated by the method briefly outlined 
in the following paragraphs. 

The usual factor equation, R = FF’ (= is used to indicate approximate 
equality), may be written 


| 
P {Q 
G 
! & (1) 
| 

le fal la 

where P is the 33-variable matrix of correlations used for extracting the 
factors, Q is the matrix of the correlations of these thirty-three variables 
with the six variables that were omitted from P, S is the matrix of the 
intercorrelations of these six variables, G is the matrix of the factor loadings 
of the thirty-three variables and H is the matrix of the factor loadings of the 
six variables. Assuming that the entire matrix, R, has the same common 


factors as does P, it follows that G is the matrix of factor loadings obtained 
by analyzing P. H can then be determined from the equations 


HG’ = Q’, (2) 


HH'= 8. (3) 
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TABLE 4 


Unrotated Factor Coefficients 
(decimal points omitted) 
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Variable ’ Commu- 
No. I Tt... 117 Iv V VI VIET VEIT IX x nality 
i 3h 16 09 -O4 17 08 -Oh -Oh -07 -02 192 
2 65 Sl -25 -03 <-17 18 o1 12 10 -02 832 
3 56 5. -2k -O4 -10 12 32 ok -06 -03 672 
4 61 46-25 -10 -12 08 = -05 00 «(+12 -02 691 
5 60 55 -2h -O4 -07 14 05 o4 12 05 TTh 
6 66 46-16 = =-03 13-10 00 Ol -02 03 699 
7 68 59 =-15 -O% 17. +~-09 Z -02 -O4 00 875 
8 67 56 -12 00 19 =~-19 -06 Ol -06 853 
9 (48) (34) (11) (06) (38) (-11) (03) (-04) (02) (-02) (519) 
10 4g +36 #116 val ol 02 «lk 05 13 Ol 4h6 
21. 57 -52 +22 -03 -02 11 -02 02 -15 O7 689 
12 55 -59 -26 -03 02 10 «11 <02 «03 13 765 
13 60 -58 -29 -Ol -02 -Ol -07 -03 -03 12 790 
14 60 -57 -26 Ol 10 O1 06 -03 -O1 -04 761 
15 58 -61 -31 02 pal Ol 12 = -02 02 <12 847 
16 60 -61 -2h 06 11 Ol 08 = -05 03 -05 826 
17 (30) (-24) (- 12) (07) (25) (-05) (28) (-02) (05) (-31) (406) 
18 62 02 2k 38-25 02 = =-O4 14 00 Ol 669 
19 59 «~-07 16 22 -31 -06 = -03 03 ~=-10 -12 556 
20 59 =-07 23 ob 8=29 <0) o4 Ol -13 12 580 
21 62 -02 17 37 #4«67 4 46«06 «= = 22 00. = -08 -O4 642 
22 64 -02 23 43-13 Ss -02~—s« 08 08 02 684 
23 63 = =-05 21 33 Dk Se 02 03 = -11 601 
2k 61 o4 24 39-11 -05 -O01 = -06 -O4 604 
25 (39) (04) (21) (19) (16) (-04) (11) (03) (53) (-13) (297) 
26 2h 12 52 26 39 22 @606~—sé=«-08 06 Ol 618 
27 28 10 55 38 32 21 ©O «15 06 09 696 
28 26 03 25 05 ho 02 «ll 13. «12 -02 358 
29 36 = =-09 08 09 48 oO) = -14 26 = -0l -09 483 
30 13 ok 43 09 kg 09 ~=—«-09 19° “salle -08 531 
31 58 39 12 -28 -09 1h -2h -19 -08 -11 718 
32 38 o4 3h .-32 86-03 Tu 266 «28. O1 -ll LL6 
33 64-1 10 -26 O7 +-ll ~-22 08 15 Ol 754 
3h 62 -15 48 -35 -12 -05 02 O1 03 -04 781 
35 60 -20 62 -28 -05 -01 33 O4 -O1 05 887 
36 05 =-09 15 -<08 <05 <1) <20° -=10 15 -09 127 
37 (66) (57) (- ad (-03) (20) (-07) (02) (-02) (-01) (00) (827) 
38 (57) (-58) (-29) (03) (14) (01) (14) (-02) (02) (-13) (807) 
39 (64) (-o4) (23) (32) (-07) (-09) (02) (04) (02) (-14) (600) 

Latent = 4.9.92 21.90 11.28 5.10 4.09 1.34 1.08 0.83 0.65 0.58 


Roots 
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Since (1) and (2) never hold exactly in practice, (2) represents an 
inconsistent set of simultaneous linear equations, there being more equations 
than unknowns. In practice, (3) is totally ignored. A least-squares (but not a 
maximum likelihood) approximate solution for (2) can be obtained (9) by 
postmultiplying both sides by G(G’G)~"*, the result being 


H = Q’G(G@’Q)"'. (4) 


It seemed more appropriate, however, and also computationally easier, 
in the present case where maximum likelihood procedures had been employed, 
to post-multiply (2) by S-°G(G’S~’°G)"*, S’ being the 33 X 33 diagonal 
matrix whose elements are the uniquenesses of the 33 variables in P. The 
result is 


H = Q'S °G(G'S”’@)". (5) 


A rigorous justification for (5) is not immediately available. Sufficient 
justification is apparent, however, when it is pointed out that in Lawley’s 
method of analysis G’S~’G is the diagonal matrix whose elements are the 
latent roots, and further that the basic equation of Lawley’s method can be 
written 


G = (P — S’)S°G(G'S”’G)"'. (6) 


The best justification lies in the clarity of the results obtained, as will be 
seen presently. 


Rotation 


The rotation of the original factor matrix toward psychologically 
meaningful oblique factors was carried out on the matrix rotator at The 
Adjutant General’s Office. Extensive final rotations were made by desk 
calculator. Variables 37, 38, and 39 (scores not “corrected for guessing’) 
were not available during the rotations. 

The main guiding principle in all rotations was psychological meaningful- 
ness, as interpreted according to the notions of the writer. The facility with 
which rotations could be made on the matrix rotator encouraged persistence 
in the ultimately unsuccessful attempt to find an arithmetic-reasoning 
speed factor. A total of 497 rotations were carried out, each involving the 
shift of only one axis. 

Table 5 gives the orthogonal projections of the thirty-nine variables on 
the reference axes—frequently referred to as the “loadings on the rotated 
factors.” Since the term “factor loading’’ has been used with various meanings 
in oblique analyses, these projections will hereafter be referred to as “factor 
coefficients.”” Table 6 gives the transformation matrix for rotating Table 4 
into Table 5. Table 7 gives the intercorrelations among the primary vectors. 
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TABLE 5 


Rotated Factor Coefficients 
(decimal points and initial zeros omitted) 
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VV Ss N P vs G@ & Xx 

Ltr. Jas ty; ViVE Vib VIE x 

1. Word Fluency (R) 25 8 -3 So TT a. 9 ul -5 
2. Verbal (A) Tor 2 iS So § «69 0 6 -6 12 
3, Vocabulary (L) oy 2 2 — 3S & 6 2 2 -8 
4, dg i 65 4& 5 9 7 8 4 13 -6 
56 e “ i aa § a eee 3 -9 10 
6. # s 65 & 2 => 2 OF =5 0-1 2 
Ts y (s) 7! <i <1 elk) <0 IAG _ oy 0 2 3 
8. Wy = 6p => 0 “l =-5 39 =5 <2 =2 2 
9. “s LIA) 38 1 3 1 4 ge 95 0” oO 2 
10. Spatial Relations (A) hoWB 5 6 2 =6 2 -5 -5 20 
ll. Intersections (L) 2 6 5 -3 & «2 3 2 6-11 
12. be (L) 22 Te alk 5 ak Oo =k = <2) 3 
13. 4 tn -2 67 3 -0 -7 7 -1 =e: ee) 2 
14, # Ss -1 67 O 0 -2 2 20 -1 2 -l 
15. e ts} -2 69 -1 =2 =], -=3 30 soe te OF 
16. " Ss) -;§ 6 O ik 5 1 =e «i 2 
17. ° (LIA) 3 29 -2 al 5? oe Bg a2 2 <3 
18. Mathematics (A) 2 -k 50 6 9 -9 -2 a2 29 0 
19. Arithmetic Reasoning (L) -2 -1 50 9 9 -5 8 7 8 -8 
20. ss . L -k & ks iocke dp at 6 -3 -17 
22. . : M “1 O 51 2. P- 20-5ek 60 §f -=5 
22. y : s 1 .2 ko 7 =9 <9 3 8 -5 5 
25. " " s) -1 oO 46 O 6 “2045 0 O -2 
oh, se vi Ss) © <L 38 16 -k 3 #7 a} =1 4 
25. e ? LIA) 8 1 16 8 5 2 2 h -5 3 
26. Number Speed (R) 3 2 =3 hO” 2626 be. Qs 
Ad " " {R) — 2s 48 -k 82 <3 say GO <6 
28. Cancellation (R) i 9% 2 0 27 10 -3 2 5 -5 
29. Picture Discrimination (R) Gr 25° 2 5 35 2 4& -5 -k 10 
30. Number Checking (R) “7 -lL 5 e536 O22 6 6 -8 
31. English (G) 5h -3 «2 > 6 2 <6 36 35 #1 
32. Foreign Language (G) 20 2 «2 9 -5 -5 5 51 2h O 
33. Eng'g. Draw. & Des. Geom. (G) 6 36 8 8 5 4 -6 38 -k 23 
34. Chemistry (G) 6 1 26 8 -1 O 2 GT 2 «<2 
35. Mathematics (G) -5 2 28 sl s&§ 2 0 66 -7 -12 
36. Conduct (G) -8 -7 6 2 -3 -l -2 15 8 al 
37. Vocabulary (NR) 70 -1 -3 0 3 2 O r 2 2 
38. Intersections (NR) -1 67 -2 aL. 0: «2 32 aa" 2 Oo 
Arithmetic Reasoning (NR) S 2 NS D <8 0 36 2 «1... 2 


39+ 
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TABLE 6 


Transformation Matrix 








I : Ee EO IV Vv VI VII VIII IX Xx 





T 415 324 192 020 009 109 O53 181 O11 O07 

II 647 -561 -O47 008 030 141 -119 -056 015 003 
TIr -389 -448 352 129 038 -060 -078 497 000 -086 
Iv. -308 -036 337 295 035 -031 082 -784 -082 002 

V 132 464362 -510 161 4167 294 o91 -192 005 O34 
VI 273 «= 408 «~-416 0 = 5525 ss-s«~-684 «=s«030—«éO5.—ié‘zd1'72”s«é«S 020 
VII 007 002 -029 056 -350 059 532 090 -392 -k32 
VIII -049 -189 357 -497 650 -345 -112 026 -5h0O 089 
Ix 261 -O42 -336 47k -hog -259 O81 183 -520 888 

x 013 «4207 ~+-223 «49346 -h93 468 -811 -145 -497 -092 





TABLE 7 


Correlations between Primary Vectors 





Vv Ss M N . Vv 5 G E x 
I II III IV Vv VI VII VIII kx xX 














V zi 3:00 22% . sid 00 =.11 ~=-.08 <.05 @-.11 .03 -.13 
Ss 77} 33% 32300. <9 -.08 -.08 .05 -.02 7 =201, 320 
M III |; .44 .49 1.00 +29 -.0h .13  .03 1300 009 9) 
N 7 1 300 .-308 329 2300 271 <6 <8 40 -.18 -.09 
P V | -.12 -.08 ~-.0b s71 12500 «33660 0-88 40 <=.51 <OL 
Vv VI | -.08 .05 .13 56 366 2300 <4% i> adh |= (ET 
s VII | -.05 -.02 .03 26 <20 | «61400 sae «322 42 
G Ofais 1 -=22 27 25 Se Mme. emer: 2.00 =32 61 
H 41 <5 <301 309 -18 -.31 <=-.14 -.22 =a? 2500 <27 
X x 1 =gas  o20 229 2209. 202....227 512 301,27 1:00 





If the last one or two factors are excluded from consideration, the clarity 
of the factor structure in Table 5 is made apparent by the visually obvious 
distinction between 2-digit and 1-digit coefficients. The 1-digit coefficients 
may be conveniently dismissed as insignificant. Each 2-digit coefficient without 
exception has an obvious realistic interpretation. 

In most factor analyses it is customary to ignore coefficients less than 
.30 or .20, say, as not reliably different from zero. Standard errors for individual 
factor coefficients have not been computed for the present study; however, 
with correlations based on 649 cases, as in the present study, the standard 
error of a correlation coefficient is about .04 for correlations in the neighbor- 
hood of zero and about .01 for correlations in the neighborhood of .80. It is 
to be expected, therefore, that the factor coefficients will have some meaning 
even in the range from .10 to .20. This will be seen to be actually the case. 
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Interpretation of Factors 


The first three factors of Table 5 correspond to the three aptitude areas 
about which the present study is centered. They are “level” factors, in con- 
trast to the next four, which are speed factors. The eighth and to a large 
extent the ninth factors are determined by academic grades. The tenth and 
last factor seems to have no simple interpretation. All these factors will now 
be discussed in more detail. 

Factor I(V) is the verbal factor. In addition to the experimental vocabulary 
tests, the following variables have two-digit coefficients for this factor, as 
would be expected: 


2. Verbal Test (A) 45 
31. English Grade 54 
1. Word Fluency (R) .25 
32. Foreign Language Grade .20 


Factor II (S) is a space factor. In addition to the experimental inter- 
sections tests, the following variables have two-digit coefficients for this factor, 
as would be expected: 


10. Spatial Relations (A) 43 
33. Engineering Drawing and Descriptive Geometry Grades .36 
29. Picture Discrimination (R) 23 


The picture discrimination test is a reference test for the perceptual-speed 
factor, but the test obviously requires also the ability to perceive and dis- 
criminate spatial patterns. 

Factor III (M) is a mathematical-reasoning factor. In addition to the 
experimental arithmetic-reasoning tests, the following variables have two- 
digit coefficients for this factor, as would be expected: 


18. Mathematics (A) .50 
35. Mathematics Grade .28 
34. Chemistry Grade .26 


Factor IV (N) is the number-speed factor, determined by the two reference 
tests included for this purpose. The only other variables with two-digit 
coefficients for this factor are two of the speeded arithmetic-reasoning tests: 


22, 24. Arithmetic Reasoning (speeded) .17, .16 


Factor V (P) is the perceptual-speed factor, determined by the three 
reference tests included for this purpose. No other variables have two-digit 
coefficients for this factor. 

Factor VI (v) is clearly the verbal-speed factor that the present analysis 
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was designed to isolate (if it actually existed) and to study. All the two-digit 
coefficients for this factor are listed below: 


8, 7, 6. Vocabulary (speeded) 39, .30, .27 
9. Vocabulary (last item attempted) 28 
37. Vocabulary (speeded; number-right score) .28 
28. Cancellation (R) 10 


The cancellation test is a reference test for the perceptual-speed factor. The 
coefficient of .10 for this test on the verbal-speed factor is not large enough 
to be of interest; a positive coefficient might be expected, however, in view of 
the fact that this test requires rapid work with alphabetical and verbal 
symbols. 

Factor VII (s) is clearly the spatial-speed factor that the present study 
was designed to isolate (if it actually existed) and to study. All the two-digit 
coefficients for this factor are listed below: 


17. Intersections (last item attempted) AY 
38. Intersections (speeded; number-right score) 32 
15, 16, 14. Intersections (speeded) .30, .23, .20 
25. Arithmetic Reasoning (last item attempted) .20 
39. Arithmetic Reasoning (speeded; number-right score) _.16 
23. Arithmetic Reasoning (speeded) 13 


The fact that all of the speed scores on the arithmetic-reasoning tests have 
small positive loadings on the spatial-speed factor is consistent with the fact 
that the arithmetic-reasoning tests contain a considerable proportion of 
simple geometry and other items that involve graphic illustrations, these 
being printed in the test booklets alongside the items. 

Factor VIII (G) is an academic-grades factor. No variables other than the 
six academic grades have two-digit coefficients for this factor. 

Factor IX (#) appears to be some sort of verbal-academic-grade factor, 
as indicated by its two-digit coefficients, which are as follows: 


31. English Grade 35 
32. Foreign Language Grade .24 
4. Vocabulary (level) 13 
1. Word Fluency (R) 4 


Factor X (X) does not suggest any ready interpretation. 


The Correlations among Factors 


The correlations among the primary vectors in Table 7 are of paramount 
interest. First, it should be pointed out that the reference axis for the verbal 
factor was arbitrarily set approximately orthogonal to the reference axis for 
the verbal-speed factor, since it was felt that interpretation would be hindered 
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by a choice of reference axes that would give the speeded verbal tests loadings 
of zero on the verbal factor. For the same reason, the spatial-factor axis was 
set roughly orthogonal to the spatial-speed axis. In each of these cases, the 
correlations between the speeded tests and the corresponding level factor 
are therefore represented approximately by the factor coefficients of the 
speeded tests and not by the corresponding near-zero correlation in Table 7. 

Because of the considerable indeterminacy as to the proper position of 
the primary vector for the ninth factor, the verbal-factor axis and the 
academic-grades axis were both set approximately orthogonal to the axis for 
the ninth factor. 

The mathematical-reasoning factor shows correlations of .44 and .49 
with the verbal and spatial factors, respectively. These correlations are 
reasonable in view of the fact that the arithmetic-reasoning tests include 
verbally presented problems, geometry problems, and other graphically 
presented problems. The only other correlations in Table 7 as large as these 
are between various speed factors. In fact, the main thing about Table 7 is 
the consistently positive intercorrelations of the four speed factors that 
have been isolated. In general, these correlate much more highly with each 
other than they do with the three “level” factors, thus demonstrating the 
existence of a second-order general speed factor. 


The Relation of Grades to Speed 


The academic-grade factor is seen from Table 7 to be positively cor- 
related with all four of the speed factors. The ninth factor, however, which 
is determined mainly by grades in English and in Foreign Language, has 
negative correlations with each of the four speed tests and with the academic- 
grade factor itself. In order to interpret the relation of course grades to the 
various speed factors, it is necessary to obtain the actual correlations of 
each of the grades with the primary vectors for each of the speed factors. 

As shown in Table 8, each of the course grades, with one minor exception, 
is positively correlated with each of the four speed factors. Although these 
relationships are not high, there is clear evidence of a positive relation between 
grades at Annapolis and speed. 


TABLE 8 


Correlations between Course Grades and Primary Vectors 





Vv Ss M N Ee Y s a ea 
cL. 3h. TEE TY VO VE VELIVidi re. Ox 





31. English (G) 61 12 39 | 20 10 15 -10 | 36 35 41 
32. Foreign Language (G) 18 13 2h | 26 12 17 Ok | 54 20 08 
33. Eng'g. Draw. & Des. Geom. (G) 09 68 46] 09 15 30 02 | 54 -00 -05 
34. Chemistry (G) 17 36 51 | 28 16 28 O07 | 76 -02 Ob 








35. Mathematics (G) 08 35 51/43 27 35 11] 8-15 02 
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Discussion 


No speed factor for the arithmetic-reasoning tests could be isolated, 
although the attempt was persistently made. The factor coefficients indicate 
that the speeded arithmetic-reasoning tests tend to involve the number- 
speed factor, the verbal-speed factor, and the spatial-speed factor to a slightly 
greater extent than do the unspeeded tests, as might reasonably be expected. 
The picture is somewhat confused by the fact that test 23 behaves slightly 
differently from the parallel tests 22 and 24, and test 19, from the parallel 
test 20. It may be that an arithmetic-reasoning factor exists in the data but 
is so very unimportant that it was not separated from “‘noise”’ in the analysis. 

The results obtained for the last-item-attempted scores are of particular 
interest. It will be remembered that these scores were not used to determine 
the common-factor space, since it was desired to center the present study 
primarily around the type of scores normally used for aptitude tests. It is 
nevertheless found, for both the verbal and the spatial tests, that the LIA 
score is a purer measure of the corresponding speed factor than are the 
corrected-for-guessing scores on any of the three speeded tests. 

It is noteworthy that in all three cases the ‘‘moderately’”’ speeded tests 
(M) are like the level tests and not like the speeded tests, even though only 
50 to 75 per cent of the examinees responded to the last item. 

Variables 37, 38, and 39—the number-right scores corresponding to 
variables 7, 15, and 23—have loadings so similar to the “corrected-for- 
guessing”’ scores on the same tests as to be virtually indistinguishable from the 
latter. 

With the exception of English, the academic grades all have higher 
loadings on the academic-grade factor than they do on any of the aptitude 
factors. This situation clearly shows that the course grades have a reliable, 
and therefore theoretically predictable, variance over and above that actually 
predicted by the aptitude tests. Whether this variance is attributable to 
personality factors or to other causes cannot be determined from the present 
study. 


Summary and Conclusions 


The present study was designed to investigate the existence and inter- 
relations of various speed factors, and their relation to academic course 
grades. 

Speeded and unspeeded, but otherwise parallel, tests of vocabulary, 
spatial ability, and arithmetic reasoning were administered to 649 entering 
students at the U. S. Naval Academy at Annapolis. Also included in the 
factorial analysis were scores on certain regular admissions examinations, 
scores on certain specially prepared reference tests, and end-of-year course 
grades at Annapolis. 
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Extraction of factors from the 33-variable correlation matrix was carried 
out by Lawley’s maximum likelihood method, the calculations being done 
on the Whirlwind, a high-speed electronic computer. Factoring was con- 
tinued until, after the extraction of the tenth factor, a significance test on 
the matrix of residuals showed them to be no longer statistically significant. 

Rotation to psychologically meaningful oblique axes was carried out 
with the help of the matrix rotator at The Adjutant General’s Office. The 
tenth rotated factor was found to be difficult or impossible to interpret. 
With this exception, the structure of the factor matrix was found to be so 
clear that a ready interpretation existed for every factor coefficient above .09. 

As would be expected, three of the factors obtained were verbal, spatial, 
and mathematical-reasoning factors. The reference tests included in the 
battery yielded the expected number-speed factor (ordinarily called simply 
the number factor) and perceptual-speed factor. The academic grades in the 
battery were found to define not only a general academic-grade but also a 
verbal-academic-grade factor. Finally, a verbal-speed and a spatial-speed 
factor were clearly identified and distinguished from the number-speed 
and the perceptual-speed factors. No arithmetic-reasoning speed factor was 
isolated. 

The primary vectors for all four speed factors were found to be positively 
intercorrelated, demonstrating the existence of a general speed factor at the 
second-order level. 

All correlations between course grades and the four speed factors, with 
one small exception, were found to be positive, although not large. It is to be 
concluded that speed of various kinds plays some part in the course grades 
studied, and that speededness in the admissions examinations is to this 
extent justified. It would seem that tests on which 50 to 75 per cent of the 
examinees reach the last item do not involve the speed factors needed; 
apparently, only very highly speeded tests involve these factors to any 
appreciable extent. 
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OPTIMAL TEST LENGTH FOR MAXIMUM DIFFERENTIAL 
PREDICTION* 


Paut Horst 
UNIVERSITY OF WASHINGTON 


For the case of a single criterion a method is already available for deter- 
mining the optimal distribution of testing time for a battery of predictors, 
assuming that intercorrelation, validity, and reliability data are available for 
predictors of arbitrary lengths. In this article a modification and generaliza- 
tion of the method is presented for the case of differential prediction involving 
a number of criterion variables. A numerical example is given to illustrate the 
method, after which the mathematical rationale is outlined. 


I. The Problem 


In (2) the importance of techniques for predicting success differentially 
in each of a number of different activities from a single battery of predictors 
was discussed. It was assumed that intercorrelations for a large battery of 
predictor variables were available and also correlations between these 
predictors and a large number of criterion variables. The problem was to 
select from this larger battery of predictors that subset of specified size 
which would yield the maximum index of differential prediction for the 
criterion variables. The index of differential prediction efficiency was taken 
to be a simple function of the average of the variances for the predicted differ- 
ence scores for all possible pairs of criterion variables. The larger this average 
variance the greater the differential prediction efficiency of the battery. It 
was shown that this index is equivalent to the difference between the average 
variance of the predicted criterion measures and the average of their co- 
variances, assuming standard measures for both predictors and criteria, and 
that the predicted criteria are the ‘least squares” estimates. A method for 
selecting that subset of predictors of specified size which would yield the 
maximum index of differential prediction was presented. 

The method referred to tacitly assumes that all predictors in the battery 
take the same amount of administration time, so that all subsets of the same 
size would also take the same amount of administration time. Usually this 
will not be the case. A more general approach to the problem might be to 
start with a given battery of predictor variables and inquire how the ad- 

*This research was carried out under Contract Nonr-477(08) between the University 
of Washington and the Office of Naval Research. Most of the computations were carried out 
by Robert Dear, Charlotte MacEwan, and Donald Mills. Much credit is due the _ 


Elizabeth Cross. Supervision of both computational and editorial activities was provid 
by William Clemans. To each of these able contributors I am deeply grateful. 
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ministration time for each predictor should be altered so that for a specified 
over-all testing time the index of differential prediction efficiency will be a 
maximum. This approach would allow for increasing the length of an experi- 
mental battery as well as for decreasing it. 

As a matter of fact, for the case of a single criterion a method is already 
available (1) for determining the optimal distribution of testing time for a 
battery of predictors, assuming that intercorrelation, validity, and reliability 
data are available for predictors of arbitrary lengths. It is the purpose of this 
article to present a modification and generalization of the method for the 
case of differential prediction involving a number of criterion variables. 

In this presentation testing time is taken to be the time actually allotted 
the examinee for taking the test. A more complete analysis must also take 
into account the time for reading instructions, practice exercises, passing 
out and collecting papers, etc. The method will first be described and illus- 
trated by a numerical example, after which the mathematical rationale will be 
presented. 


II. Numerical Example 
The predictor variables used in this example are: 


(1) Guilford-Zimmerman Aptitude Survey, Part I, Verbal Compre- 


hension 

(2) Guilford-Zimmerman Aptitude Survey, Part III, Numerical Opera- 
tions 

(3) Guilford-Zimmerman Aptitude Survey, Part VII, Mechanical 
Knowledge 


(4) A. C. E. Psychological Examination, Quantitative Reasoning 
(5) A. C. E. Psychological Examination, Linguistic Reasoning 
(6) Cooperative English Test (Form OM), Usage 


The matrix of test intercorrelations with reliabilities in the diagonal is given 
in Table 1. The criterion variables are grade-point averages in each of ten 
college subjects. The matrix of validity coefficients is given in Table 2. 

The over-all testing time for the tests of arbitrary length is 142 minutes. 
We assume that this time is to be cut in half so that the over-all testing time 
is 71 minutes. The problem is to determine the time to be allotted to each 
test so as to maximize the index of differential prediction efficiency. 

The traditional assumptions are used here as in (1) with respect to the 
effect of test length on correlation and will not be repeated. Altering the 
administration time for any test will, of course, alter the number of items. 
In the following discussion, the terms test length and testing time are used 
synonymously. 

The method of solution for the new test lengths involves a series of 
successive approximations. For large numbers of predictor and criterion 
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TABLE 1 


R Matrix of Predictor Intercorrelations with Reliabilities 
Substituted for Unities in the Diagonal: 











R = r-Dy 
1 2 3 4 5 6 > 
1 G-Z1 -920 159 152 .281 - 763 2515 2.790 
2 G-Z 3 159 -920 -003 - 369 .292 2243 1.986 
3. Gaz 7 .152 -003 -920 .200 1420 =.150 1.267 
k ACE-Q 281 . 369 -200 .820 ~549 426 2.645 
5 ACE- - 763 2292 142 2549 -830 .628 3.204 
6 English .515 243, -.150 426 .628 .860 2.522 
= 2.790 1.986 1.267 2.645 3.204 2.522 14.414 





TABLE 2 


The rg Matrix of Validity Coefficients 





1 e 3 4 5 6 
G-Z1 G-Z23 G-Z7 ACE-Q  ACE-L_ English z= 








1 Anthropology «3790 att 091 294 341 0357 1.630 
2 Chemistry o3T 274 -016 309 364 -399 1.679 
3 Economics 2339 eel 008 -2k1 2 334 6323 1.456 
4 English 526 2247 = -.075 -262 488 52k 1.972 
5 Foreign Lang. .295 287 -.156 200 232 426 1.284 
6 Geology 184 -140 -094 170 -229 214 1.031 
7 History 379 169 = -.001 -182 2373 336 1.438 
8 Mathematics 287 348 = -.088 - 350 336 e401 1.634 
9 Psychology 440 si7o .096 285 409 403 1.803 
10 Zoology - 336 -216 5031 318 2 345 0352 1.597 
= 3.473 2.239 016 2.611 3.451 3.734 15.52% 
2/10 347 22h .002 -261 345 373 1.552 





variables the solution may become very laborious. It is probable that the 
solution would be greatly expedited by the use of high-speed computing 
equipment. Further research may yield more efficient computational pro- 
cedures. 

1. The first computational step is to calculate a matrix a! from the 
matrix r{ in Table 2. The elements a/ of Table 3 are the corresponding 
elements of Table 2 with their column means subtracted. Hence the columns 
of Table 3 all add to zero. 

2. The next step is to compute the elements for a diagonal matrix, A. 
The 7th element is the product of the original length of test 7 multiplied 
by one minus its reliability. The elements for A are given in row 4, labeled 
1’ Ain Table 4. For the first element we have A, = 25(1.00 — .92) = 2.000. 

3. A first approximation is now required for the altered test lengths. 
We assume the new test lengths to be proportional to the original test lengths. 
Therefore, as a first approximation to the new test lengths we take one half 
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The a’ Matrix: Validity Coefficients Expressed 
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TABLE 3 


in Deviation Form for Each Test 























1 2 3 y 5 6 
1 2023 -.047 .089 -033 -.004 -.016 
2  -.030 .050 .014 048 -019 -026 
3 +-.008 -.013 -006 -.020 -.011 -.050 
4 -179 2023 -.077 -001 143 0151 
5 -.052 063 -.158 -.061 -.113 053 
6 -.163 -.084 -092 -.091 -.116 -.159 
7 -032 -.055 -.003 -.079 .028 -.037 
8 -.060 2124 -.090 .089 -.009 .028 
9 -093 -.054 OQ 024 064 -030 
10 =-.011 -.008 .029 .057 .000 -.022 
Ck 003 -.001 -.004 .001 -001 -004 
> 003 -.001 -.004 -0O1 -001 -004 
TABLE 4 
Computation of 1'D and 1'AD; 
a 1 
. y . ' T} ' 
irst approximation: 1D). = we aD, 
. 1'D,l 
1. 4 5 6 Ck = 
1 1D, 25.0 9. 30.0 3.0 15.0 40.0 142.0 
2 1' Dp. =-5 1'D, 12.5 4.5 15.0 11.5 Ta5 20.0 71:0 71.0 
3 1'D;? 060 .222 .0667 .087 1333 050 
4 1a 2.000 .720 2.400 4.140 2.550 5.600 17.410 
5 1' ADB, 160 .160 .160- .360 .340 .260 1.460 





the original test lengths. Row 1 in Table 4 gives the 
Row 2 of the same table is one half the first. 


4. Calculate the reciprocals of the D, elements. 


row 3 of Table 4. 


original test lengths. 


These are given in 


5. Calculate the product of each A value in row 4 of Table 4 by the 
corresponding value immediately above it. The products are entered in row 5 
of Table 4. For example, the first value is .160 = 2.000  .080. 

6. Next the elements calculated in step 5 are added to the corresponding 
diagonal elements of Table 1, and the table is copied into the upper left 
quadrant of Table 5. The first diagonal element is 1.080 = .160 + .920. 
Note that the elements below the diagonal are not copied in. The upper 


right section of Table 5 is a, , the transpose of Table 3. 








a> 
and 
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7. We next calculate a matrix LZ, by premultiplying the matrix a, by 
the inverse of the matrix in the upper left quadrant of Table 5. The compu- 
tations for the forward solution are given in the two lower quadrants of 
Table 5 and in Table 6. The back solution is given in Table 7. The procedure 
for multiplying a matrix by the inverse of a symmetric matrix is outlined in 
(3). 

8. The second approximation to the new test lengths is computed in the 
lower section of Table 7 as follows: 

Row a consists of the sum of squares of column elements of the L, matrix. 
For example, the first element in row a, namely, .0626, is the sum of squares 
of the first ten elements in column 1 of Table 7. 

Row b is copied from row 4 of Table 4. 

Row ¢ consists of the products of corresponding elements in the two pre- 
ceding lines. For example, .1251 = .0626 X 2.00. 

Row d consists of the square roots of corresponding entries in the pre- 
ceding line. For example, .3537 = VV .1251. The computations to the right 
of this line and designated s are obtained by dividing the over-all new testing 
time, 71 minutes, by 1.8823, the sum of the elements in the row. This gives 
s = 37.7198. 

Row e is a check row. Each element in the second line above it is divided 
by the element immediately above. Thus .1251/.3537 = .3537. 

Row f is obtained by multiplying each element in row d by s. For 
example, the first element is 13.3415 = .3537 X 37.7198. This line gives the 
second approximation to the new test lengths. 

Row g is obtained by dividing each element in the preceding row into 
the corresponding value in row b. For example, the first value is .150 = 
2.00/13.3415. 

Row h is a check on the preceding row. Each element in row g is multi- 
plied by the corresponding element in row f to give row h, which should 
correspond, within limits of rounding errors, to row b. For example, the 
first element is 2.001 = .150 & 13.3415. 

tow 7 is obtained by adding the elements in row g to the corresponding 
reliabilities. For example, the first element is 1.070 = .150 + .920. 

9. A new L, matrix is now computed by repeating steps 6 and 7 and using 
the elements of row 7 of Table 7 in the diagonal positions of Table 1. The 
new L, matrix is given in transposed form in Table 8, rows 1 through 10. 

10. Step 8 is repeated in rows a through 7 of Table 8. Row f of this 
table gives a third approximation to the altered test lengths. 

Steps 6, 7 and 8 are repeated to get successive approximations to the test 
lengths. The calculations were carried to 5 successive approximations for 
the new test lengths, not counting the first. These are summarized in Table 
9. As will be seen, the iterations have not completely stabilized. However, 
for practical purposes, the approximation is doubtless adequate. 
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TABLE 9 


Successive Approximations to 1'Dp » for T; = #1 = ue = 71 


Value of ¢ for 








Sppres p - : 3 : 2 J 2 Successive Values 
(.5)1 Dg: 1 12.50 4.50 15.00 11.50 7.50 20.00 71.00 of L 

2 313.3% 5.50 32.76 22:57 12.55 26 72.00 Ly 2227 

3 33:27 5.32 11.55 22:52 26.05 22:29 “72.00 L, 2234 

4 13.23 5.20 10.98 12.47 17.62 11.51 71.01 L3 2235 

5 13.31 5.15 10.76 12.46 36.13 13.29 71.00 Ly 236 

6 13.35 5.12 10.70 12.46 18.37 11.00 71.00 Ls 237 





11. To compute the successive indices of differential prediction efficiency 
¢. , we proceed as follows: 

(a) For the index corresponding to the first approximation to the new 
test length multiply each element in the L, matrix in Table 7 by the corres- 
ponding element of Table 3 and sum the products. This is the first entry, 
.227, in the @ column at the right of Table 9. 

(b) To get ¢. follow the same procedure except use the L, matrix in 
Table 8 instead of L, in Table 7. 

(c) In the same way calculate subsequent ¢’s by using the elements in 
the corresponding L matrix and the elements in Table 3. 

It will be noted that ¢ does not increase much in this particular illustra- 
tion. It goes from .227, taking the test lengths as one half their original 
length, to .237 as they approach optimal length. This is an increase of less 
than 5 per cent even though several of the test lengths are changed greatly. 
For example, test 5 increases from 7 to 18 minutes while test 6 reduces from 
20 to 11 minutes. 

The technique was applied to the same data assuming that the total 
administration time was to be the same as in the original administration, 
namely, 142 minutes and also assuming it was to be doubled to 284 minutes. 
Only three approximations to the optimal test lengths were calculated for 
each of these two conditions. Tables 10 and 11 summarize the results for 
the two conditions, respectively. The last column in each table shows the 
index of differential prediction efficiency, ¢, corresponding to each approxi- 
mation to optimal test lengths. In both cases the improvement of ¢ as the 
tests approach optimal length is appreciably greater than for the case of 
one half the original testing time. This rate of improvement is greatest for 
double testing time. As can be seen from the right-hand column of Table 11 
it goes from .305 to .337, which is approximately a 10 per cent increase. 
Further research is needed to determine the sensitivity of @ to alterations in 
relative testing time for each of the tests and to variations in total testing 
time. 
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TABLE 10 


Successive Approximations to 1' Dy , for Tj = 1) = 142 


Value of ¢ for 








Approx’p i é 3 4 2 6 Z Successive Values 
(1) 1 Dy: 1 25.00 9.00 30.00 23.00 15.00 40.00 142.00 of I 

2 25.38 8.85 20.04 25.55 33.62 28.56 12.00 L, +265 

3 25.83 7.60 16.02 25.41 42.43 24.73 142.02 Ly .282 

4 26.48 7.30 14.72 25.22 4h.54 23.73 141.99 L3 -283 





TABLE 11 


Successive Approximations to 1'D, » for T, = 2T) = 2(142) = 284 


Value of % for 
Approx'n aL 2 S = 2 6 z Successive Value 
(2) 1'Da: 50.00 16.00 60.00 46.00 30.00 80.00 28%.00 ot 2, 


1 

2 51.51 13.62 30.14 51.31 83.57 53.84 284.00 Ly 2305 
3 54.84 10.84 22.51 51.17 97.72 46.91 284.00 L, 334 
4 


56.32 10.53 21.24 50.94 99.64 45.34 284.01 L, 337 











III. Mathematical Derivation 


In (1) a procedure is developed for altering test lengths in a battery 
to give maximum multiple correlation with a single criterion. The develop- 
ment of this procedure will be reviewed and the procedure will be extended to 
the problem of differential prediction. Let 


M = the number of cases, 

nm = the number of predictors, 

Z =an(M X n) matrix of test scores in a battery of altered lengths with 
the elements of Z of the form (z;; — 2;)/(WMc.,), 

W =an (M X 1) vector of criterion scores with elements of the form 
(w a. w)/(WM Tw), 

B = an(n X 1) vector of regression coefficients for estimating W from Z, 

r = an (n X n) matrix of intercorrelations of tests of original lengths, 

p = an(n X n) matrix of intercorrelations of tests of altered lengths, 

r, = an (n X 1) vector of validity coefficients for the tests of original 
lengths, 

pe = an (n X 1) vector of validity coefficients for the tests of altered 
lengths, 

D, = an(n X n) diagonal matrix of original test lengths, 
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D, = an(n X n) diagonal matrix of altered test lengths, 


D, = D,D~" be the ratio of altered to original test lengths, 
D,,, = the (n X n) diagonal matrix of reliability coefficients for the tests of 
original lengths. 
Let 
i= [I + (D, i DD,,,)D>". (1) 
Let 
e = (ZB — W). (2) 


We wish to minimize ¢’e with the constraining condition 1’D,1 = T, 
where T is the total testing time specified for tests with altered lengths, 


and 1 is a column vector of all unit elements. 
To obtain e’e minimum under this condition, let 


y = ee + Al’D,I1, (3) 


where J is a Lagrangian multiplier. From (2) 
y = (B’Z’ZB — B’Z'W — W'ZB + W'W) + Xl1’D,1. (4) 


From the definitions above 


Z'Z =p, (5) 
ZW =»., (6) 
WwW =1. (7) 


Substituting (5), (6), and (7) in (4) 

vY = B’pB — B’p. — ptB+1+A1'D,1. (8) 
In (1) it is shown that 
pe = 514, (9) 


and 
p= 6'7%r — D, + D,D.D;")s""”, (10) 


where we define D, = I — D,,, , a diagonal matrix of test unreliability 
coefficients. Let 
B _ rs. (11) 
Substituting (9), (10), and (11) in (8) 
y = B(r — D, + D.D.D;")B — B’r. — ri8 + 1+ -dA1’D,1. (12) 


The unknowns on the right-hand side of (12) are 6, D, and X. 
Differentiating (12) with respect to 6’ and equating the resulting ex- 


pression to zero to get an extremum, 


= =r, —(r — D, + D.D.D;')8 = 0, 
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or 


6 =(r — D, + D.D.D;')""r. . (18) 
Differentiating y with respect to the scalars, b; , (¢ = 1, 2, --- , n) and 


equating the n resulting expressions to zero 


Se =X Bilua,)/03 = 0 (14) 


or 
b; = Bua)? /r'”. (15) 


Summing these n equations 
dbs = Dy Buia)’. 
Thus in matrix notation we obtain 
x”? = 1(D,D,)'"8/1'Dyl. (16) 


Substituting for \? in (15) and collecting these n expressions as the diagonal 
matrix D, we obtain 

= v2__1’Dyl 
D, xe D,(D..D.) 1'(D,D,)'”"B ’ (17) 


where D, is a diagonal matrix with the 6; as diagonal elements. 
In (1) it is shown that 





> (DD NU DD PY” 
B = ( dD, + 1'D,1 Te « (18) 
Using (18) in (17) we can therefore solve for D, , the new test lengths. The 
new multiple correlation is given by 
Ry = Br, . (19) 

Next we extend the procedure to the case of differential prediction. 
Consider the following additions to the definitions given above. Let 
N = the number of criteria, 
an (M X N) matrix of criterion scores whose elements are deviate 


W 
scores of the form (w,; — W;)/(VM o.,), 

H = an (M X N’) matrix consisting of difference vectors for all possible 

B 


pairs of criterion vectors 7 and j, including 7 = j, 
= an (n X N’) matrix of “least squares” regression vectors for estimating 


H from Z, 

r, = the (n X N) matrix of validity coefficients with the tests of original 
lengths, 

p. = the (n X N) matrix of validity coefficients with the tests of altered 


lengths. 
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From the differential prediction procedure (2) we have 


@ = 1’D.1 — (1'C1/N), (20) 

the index of differential prediction efficiency, where 

C=ry"’, (21) 
and D, is a diagonal matrix of the diagonals of C. 
Let 

E=ZB-H (22) 
and 

F; =e 1’ —TI (23) 


where e; is a column vector of all zero elements except the ith, which is unity. 
Let 


G’ = (F, , F. -°+ , Fy). (24) 
Thus we have 
H = WG’ (25) 
and 
E = ZB — WG’. (26) 
From (23), (24), and (26) postmultiplied by G and divided by 2N we obtain 
e = (EG/2N) = (ZBG/2N) — W[I — (11’/N)] (27) 


since G’/G = 2N[I — (11’/N)]. It can be shown that the trace of ¢’e is 
equivalent to the trace of E’E. Let 


BG/2N = J. (28) 
Let 
W[I — (11'/N)] = t. (29) 
Then 
e=ZJ —t. (30) 
We wish to minimize the trace of e’e with the constraining condition 
1’D,1 = T. 
Let 
y =tre’e + Al’D,1. (31) 
Let 
ve = pe[Z — (11’/N)]. (32) 


Substituting (5), (6), (9), (10), (28), (29), (80), and (32) in (31) we obtain 


vy = tr [8 7(r — D+ DLD.D;') 87 — J'ye — iJ +t DL. 
(33) 
Let 
s?y = L. (34) 
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Let 
a, =1.[I — (11'/N)] = 3. . (35) 

Let 
R=r-—D,. (36) 

Let 
A= D.D.. (37) 

Substituting (34), (35), (86), and (37) in (33) we obtain 

vy = tr (L'(R + AD;")L — L'a, — afL + t’t] + A1’Dyl. (38) 


Differentiating (38) with respect to row vectors of L’ and equating the results 
to 0 we obtain 





ee iy Ee a eo 


or 

a, = (R + AD;")L. (39) 
Differentiating (37) with respect to D, and equating the results to 0 we 
obtain 


0 = 
hl Dig tl 0 (40) 


where D,,’ is a diagonal matrix whose non-zero elements are the diagonal 
elements of LL’. Hence 











D, = (DitA)'?/n”. (41) 
It can be shown that 
1/2 __ 1"%Dzz-A)'"1 ee 1(Drr-A)'71 
—— T (42) 
Substituting (42) in (41), 
T 
— 1/2 7 
D, (D,1:A) I'(De,-dy "1 (43) 
From (39) 
L = (R + AD;')~"a, . (44) 
Let 
L; = (R + AD;j})~"a, , (45) 
where 
D,. = F D (46) 
” lee 
and 
1/2 
Dis. (Dive A) 7 (47) 





’ Fake td 


Using (45), (46), and (47) as a basis of successive approximations to L; and 
Bis ; continue until D,, stabilizes satisfactorily. 
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Now the regression vectors for the optimal test lengths will be given by 


Y=p''p.. (48) 
From (9), (10), (36), (37), and (48) 
Y = 8'7°(R + AD;')"'r. . (49) 
But from (35) and (39) 
L = (R + AD,")’r.[I — (11'/N)]. (50) 


From (50) 
aL = 8/°(R + AD,")'r. — &7(R + AD;z")"'r.(11'/N). (51) 


From (49) and (51) 
Y = 6?(L + (R + AD;')"'r.(11'/N)]. (52) 


Furthermore the index of differential prediction efficiency ‘¢” as defined 
in (2) can be shown to be 
¢=trL’a,. (53) 

The procedures outlined in Section II may be related to the above 
mathematical development as follows: 

Table 1 is given by (36). 

Step 1 is based on (35) 

Step 2 is based on (37). 

Step 3 is based on (46). 

Step 4 consists of calculating D7! from D,, . 

Step 5 consists of calculating AD} . 

Step 6 consists of calculating the parenthetical term on the right side of 
(45) fori = 1. 

Step 7 consists of calculating L, from (45). 

Step 8 consists of calculating D,, from (47). 

Step 9 consists of calculating an L, matrix from (45). 

Step 10 uses equation (47) to calculate D,, . 

In general steps 6 and 7 are repeated for successive values of 7 in (45) 
and step 8 is repeated for successive values of 7 in (47). 

Step 11 uses (53) to get successive values of ¢. 
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THE REGRESSION OF GAINS UPON INITIAL SCORES 


R. F. GarsipE 
UNIVERSITY OF DURHAM, ENGLAND 


A method of estimating the regression of gains upon initial scores is 
suggested and compared with two other methods which have been used in 
recent investigations. 


I. Introduction 


Thorndike (12) pointed out in 1924 that the correlation between obtained 
gains and initial scores tended to be negative when the correlation between 
true gains and true initial scores was not negative. In the same year Thomson 
(10) showed that this tendency was due to the same errors of measurement 
occurring in both gains and initial scores with different sign. He derived a 
formula for calculating the correlation between true gains and true initial 
scores. Thomson (11) and Zieve (16) have given alternative forms of 
Thomson’s original formula. They did not, however, consider curvilinear 
regression. 

The question of the regression of gains upon initial scores has arisen 
again recently in several papers concerned with the effect of practice and 
coaching on intelligence test scores (7, 13, and 14). It therefore seems 
appropriate to consider the measurement of this regression, both linear 
and curvilinear. 


II. Linear Regression of Gains upon Initial Scores 


If the same test is given twice, or two parallel tests as defined by 
Gulliksen (3, p. 11) are given to the same group of testees, then 


where X = true initial score, Y = true final score and G = true gain. Now 





Oylxy — Ox (2) 


2 2 
Vox + oy — 2oxoyrxy 





tox = 

and, therefore, 
bex ca byx ~ 1. (3) 
It is clear from (2) that if rry = 1, then rgx = 1. But if there is any 
variation in true gain between individuals within arrays of true initial score, 
then neither ryy nor r¢x will equal unity. If, and only if, rry = rex = 1, then 
bex = (cy/cx) oak © (4) 
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In general, however, b¢x increases with rxyy as well as with the ratio of c, to 
c,. Thus any estimate of the regression of gains upon initial scores should 
increase not only with the ratio of the standard deviations of final and initial 
scores but also with the correlation between these scores. 

In practice, what we want to do is to measure the true gains at different 
levels of initial ability as indicated by initial test scores. That is, we want 
to be able to ascertain the regression of true gains upon obtained initial 
scores, whether the regression be linear or curvilinear. We shall deal with 
linear regression first. Now 

ae Wier — ox (5) 


Tez oC ) 
TG 


where x = obtained initial score, and, therefore, 


ha — Oylzy — OxTsx (6) 
Oz 


Assuming errors of measurement to be independent of initial scores, and 
applying the usual correction for attenuation, (6) becomes 


Toy — OSes 
be, = eee (7) 
Cz 


where y = obtained final score and r,,, = reliability of initial score. Thus, 


bez = bys — Tyr" (8) 


a 


The value of bg, obtained from (8) increases with r,, and with the ratio of 
c, toc,. Errors of measurement do not tend to make bg, negative; they do 


not appear in G. 
It should be noted that r,,, may not be equal to 7,, ; if rry < 1, then 


Tey <PesTer' 5 (9) 


where r,,' = reliability of final score. Thus it is essential to obtain an in- 
dependent measure of r,,,. One of the internal consistency methods must 
be used for this purpose. If the split-half method is used, then the formula 
for the variance of bg, , derived below, is applicable. 

The first half of the initial test must not be correlated with the second 
half. Odd items should be correlated with even items, so that individual 
variations in gain within the initial test itself will not reduce the correlation 
between its two halves. The obtained correlation must, of course, be corrected 
to give r,, . If the variance of bg, is to be calculated by the method derived 
below, the Spearman-Brown formula [Kelley (4, p. 406)] should be used in 
the form 

4C,,° 


Tat = (10) 
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where C,,, = covariance of z and z’, the two halves of the initial test, and 
V, = variance of the initial test. 

It should be borne in mind that the split-half method only gives accurate 
results when the two halves of the test measure the same factor or factors, 
and when the test is a power test rather than a speed test. Cronbach and 
Warrington (2) have recently discussed the estimation of the reliability of 
time-limit tests. 


III. Variance of Regression Coefficient 


To test the significance of the linear regression of true gains upon initial 
scores, we require the variance of bg, . If the split-half method has been used 
to estimate r,,' , then 


Vie. —_ Vive + Vacus/Vs = QC oyel4Cus*/Ve) ° (11) 


The variance of 4C,,-/V, can be found by Pearson’s method (6, p. 492). 
Ignoring cubic terms, then it follows directly from Pearson’s equation (7it) 
that 





_ (4C,,:)° | ea 2Crtcu 
Vacwive = VW Lat V2 AVC I aa 
But 
Vic. = 16V¢,,, (13) 
and 
Cyescrs’ = AC vi e4e'1Cus? = AC iv. 4V5'42Cee'] Coe? (14) 


= AC y.c.," -f AC ve+Ceo? aa 8Vc,,- . 


According to Wishart (15, p. 44), as quoted by Kelley (4, p. 555), for a normal 
bivariate population, 


N’Vc,, = (N — 1)(Vi V2 + Ch), (15) 
where V and C indicate population variance and covariance, respectively, 
and 

N’Cy,c1 - 2(N so 1) VC. . (16) 
If N is large, we may substitute sample values for the parameters in (15) 
and (16) and write (13) and (14) as follows: 


V pe 16(V,V,. + Ci.) 
4Css' ~~ N 





(17) 


and 


CvesCes: sis (8V,C,,: + 8V,-C,, + 8V,V,. + 8C%,.)/N 
_ [8(V, + Cia (Vs: + Ci.) |/N. 


(18) 
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Similarly, it follows directly from another of Wishart’s equations (15) that 
Vy, = 2V./N. (19) 
Substituting (17), (18), and (19) in (12), simplifying, and collecting terms, 
16 
Vic.e+/Ve sid vyalV(VeVe + 3C%.*) nil 4C.°(V, + C..)(V,: + C,.’)]. (20) 
But 
V, = V, + V,- + 2C,,: . (21) 
Therefore, substiuting (21) in (20), simplifying, and collecting terms, 


16V,\ 
Vicse+/Ve _ Ny? 





[Vit Vie — 2C i — Vitter — Vetie + 2C rete] 

(22) 
16V,V,- 
NV: 
We may note, in passing, that if it is assumed that V, = V,,, then (22) 

simplifies to 





[V.+ V. — 2C,,-][1 — r3,-]. 


4(1 — r,,-)? 


Vacws/Ve = NW +r.,.)?? (23) 


which is the same result as obtained by Shen (8, p. 462), using quite a 
different method. 
The covariance of b,, and 4 C,,-/V, can be derived by writing 


bys — 1 V. bd (24) 
Then, by using Pearson’s equation (v) (6, p.493), 


ae 40 0° CoxysCee? Cy.ce, Cy.4c.. + Vee 


Ccey/Ve) (4Cae'/Ve) = V2 | Se co» Vin 25 ew V: 








(25) 


Now 
Cossscesr = AoteseriyCeer = ACicystcye'1Crer = Aoeyces + AC cys-crer + (26) 
But Wishart (15) has shown that, for a normal trivariate population, 
N°Ce,.c.. = (N — 1)(ViCos + CraC 2). (27) 

Thus, assuming N is large, we may write 

Cosyscesr = AVC ye + CysCoer + Vays + Cys Crs)/N 

= A[C,..(V. + Crs) + Cy(Ve» + Cr) I/N. 

Moreover, it follows from (16) that 


Cvecey at 2V.C.,/N. (29) 


(28) 
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Therefore, substituting (18), (19), (28), and (29) in (25) and simplifying, 
Ceomrrartsoserves Fp yAVCoelVe + Cov) + ViCu(Vor + Cov) 
— 2C.(V, + Cr Ver + C,.°)] (30) 
“ pt V CoCo + Cy) + VVC + VerCy) 
— 2C.(V, + C...)(V.- + Ce). 


But 
Cys" + Cus = Cw ° (31) 


Substituting (21) and (31) in (30), multiplying out, and collecting terms, 
Ceomrracsenires = FagslV CoolVs + 2x) + VirCy (Vor + 20.) 
— C,,(V.C.. + V.C.. + V.V,-)] 
= Hall .Cu(Ve — Vo) + VeCy(Ve — Vi) 
- CAV Cie + ViC. + VV.) 
= lV V.Co + Villy) — VVC + Cy) 
— CAV Cre + VeC.» + V.Ve)] 
= ppl Cw + Vela 
— b,AV.C.e + Val. + 2V,V.)] 


4V.V,: 
NV; 


Thus, substituting (22) and (32) and the usual formula for V,,, in (11), we get 


V tue ai AV all an rey] + TPH, + Vy aie 2C,.’][1 he re] 





[bye + bys iia byz(Ds," + bars - 2)]. 





(33) 


_ SY,Fe 
Vi, 
Equation (33) involves the assumptions that N is large and that distributions 


are normal. 
It has been pointed out to me that, by a theorem of Cramér (1, p. 366), 


any function of moments such as bg,, when N is large, is approximately 


[bys +> bys pas bye(Dee? + Dare + 2)] . 
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normally distributed about the corresponding population parameter. Thus, 
if N is large, bg, may be regarded as being distributed normally about 


be, With a variance as given by equation (33). 
IV. Curvilinear Regression 
When the regression of true gains upon initial scores is not linear, then 
a polynomial 
G=a+ be+exr? +--+ + pr” (34) 
can be used. But before the weighting coefficients b, c, etc. can be calculated, 


we must ascertain the relationships between them and the coefficients b’, 
c’, etc. of the polynomial 

y=a't+ betexr? +--+ +p’2". (35) 
These coefficients may be calculated from the available data using the 
ordinary multiple correlation technique, which Kelley (4, p. 445) refers to in 
this connection as parabolic regression. 

It follows from (6), by substituting x* for x, multiplying by V,. and 
correcting for attenuation, that 

Cou = Cm — Ces ° (36) 
C,:,« in (36) is the covariance between the scores of a hypothetical appli- 
cation of the initial test and the ath power of the scores of another application 
of the initial test, a being an integer. 

If the two applications of the initial test are parallel, then their standard 
deviations are equal. Thus, following Mollenkopf (5), as quoted by Gulliksen 
(3, pp. 119 and 120), we may write 

2’ =7f,,2+¢e', (37) 
where x and x’ = deviation scores of the initial test and e’ = error of estimate 
when z’ is predicted from zx. Now 


NC... = T2'r’. (38) 
Therefore, substituting (37) in (38), 
NCzree = Tez Dt + Se’x”. (39) 
Let us assume that 
Te,2 = 0. (40) 
From the raw score correlation formula 
Perse = (Ze’x® — Ne’ x*)/(No.-2). (41) 


Thus 
Ye'x® = Ne’ x’. (42) 
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But 
Ne’ = Ze’ = 0. (43) 
Therefore 
Ze'x® = 0. (44) 
Thus, substituting (44) in (39), 
Cage * f,—:22""'/N. (45) 
Therefore 
Ceres = Tez D2 ’ (46) 
where D,,. = covariance between x and x*. Therefore, substituting (46) 
in (36), 
Cave = am — T25'Dese e (47) 
In matrix notation, 
bez = hi ° (48) 
Therefore, substituting (47) in (48), 
bez = (Cn oe nae, (49) 
where D,, = the row matrix [V, C,,. --- C,,»], and r,,, = a scalar. Thus 
Dez = bys — tie DLan . (50) 
But 
D.C. = (100--- 0). (51) 
Therefore, reverting to ordinary notation, 
Digscesessan = Dishes eames fC (52) 
and 
Decescctsosan = Didiegas sige ’ (53) 


where c, d, and n are different integers not equal to unity. 

It is evident from (52) and (53) that, although the regression lines of 
G and y on x do not have the same slope, they are, nevertheless, the same 
shape. If, therefore, the regression of y on x is not linear, then the departure 
from linearity must be due to the curvilinear regression of G on x. Thus, 
the usual test of departure from linear regression [Snedecor (9, section 14.4)] 
of y on x provides a test of the significance of the curvilinear regression of 
G on x. If the regression of y on 2 is significantly non-linear, it may be found 
that a second-degree polynomial will fit the data satisfactorily. Whether 
the inclusion of x* and higher powers of x produces any significant improve- 
ment in the prediction of y, and therefore of G, can be tested by analysis of 
variance [Kelley (4, p.448)]. 

It may be argued against the method here proposed that gains should 
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not be forced on to a curve of certain shape. But if a polynomial of sufficiently 
high degree has been used so that the deviations of y, and therefore of G, 
from the resulting curve are not significant, then such deviations can be 
ignored. If, however, there are a priori reasons for taking such non-significant 
deviations into account, then a polynomial of higher degree can be fitted. 


V. Comparison with Other Methods 


Wiseman and Wrigley (14) used three methods of calculating differential 
gains but finally decided to base their conclusions on the method which they 
describe as follows: 


The “‘level”’ of ability was defined by taking the sum of the quotients for the initial 
and final tests. Thus, if z is the score on the initial test and y the score on the final test, 
(x + y) was found for each child. The z + y scores were now placed in rank order and di- 
vided into levels (e.g. 180-199, 200-219), and the average gain (y — x) for each level calcu- 
lated. By plotting these points graphically it can easily be seen whether there is any rise or 
fallin (y — x) with increase in (y + x), and by fitting a straight line, or curve, to the points, 
the gain at any particular ability level may be estimated. It is necessary, of course, to 
re-translate the z + y score into an I.Q. score on the first test in order to make the results 
meaningful. One way of doing this would be to find the average value of x (I.Q. on initial 
test) for each class-interval of x + y. A preferable method is to find equivalent x values from 
the regression of x on ( + y). This was done. 


Now 
r y—z) (y+z) -_ (V, sian V.) a aeeey (54) 
la(ytz) = or ~ g,) 'C y+2) ’ (55) 
and 
E, = base (y + 2) +; (56) 
where EL, = estimated value of x, and c = a constant. 
Having made a corrviation scatter of (y — x) against (y + 2), the 


correlation 7,,-2)(y+2) 18 not changed by the substitution of #, for (y + 2), 
since the correlation scatter itself remains unchanged. Therefore we may 
write 
T(y-z) Be — "(y-z)(yt+z) + (57) 
Now 
Tx, = FN z(y+z) » (58) 


Thus, substituting (55) in (58), 


Oz, = TAOS ry i Fz) /O.y+2) . (59) 


Now 
, / . 
by z)Ey, ~~ Tuy z)Ee%(y z)/TR, . (60) 


Therefore, substituting (54), (57), and (59) in (60), 
V, V, 


by-28 = : ss, Oe 
* , O2(O' ey + O;) 
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Therefore 


ae (V,/V:) as .. 


Diy-2) Bs os b + 1 (62) 


The coefficient b.,-.)z, is an estimate of the linear regression coefficient 
of gains upon initial score obtained by Wiseman and Wrigley. Errors of 
measurement do not tend to make this coefficient negative and it increases 
as 7,, decreases. 

Peel (7) describes the percentile method as follows: ‘After calculating 
the scores at given percentiles in each of the three distributions, the differ- 
ences between these ‘equivalent levels’ are then taken as the practice effects.” 
These practice effects at each given percentile are then compared with 
corresponding initial scores. Assuming that the distributions are normal, 
the percentile method will give results as given in Table 1. 


TABLE 1 


Theoretical Differential Gains 








Percentile Initial Test (x) Final Test (y) Gain (g = y-x) 
98th x + 20, y+ doy (y-x) + 2( 1y7Oy) 
84th X + oY y+ Oy (y-%) + (o,-0,) 
50th x j (7-2) 
16th i y- oy (¥-%) (s,-o,) 
2nd x - 2o, y- 2oy (y-x) - 2(s)-o,) 





We can therefore write 


g= 2 — H+ 9-4 (63) 


and, therefore, assuming distributions are normal, 


bos aed (o,/0,) -~ j, (64) 


It is evident that errors of measurement will not tend to make this regression 
coefficient negative. Moreover, gains calculated by the percentile method are 
independent of r,, but increase as the ratio of a, to o, increases. 
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If it is assumed that r,, = 1, then equation (62) becomes 


b = (¢,/¢2)" - wat | 
wom © G,/0,) + 1 
= (¢,/o,) — 1 
= b,.. (65) 


Thus if r,, is equated to unity the Wiseman-Wrigley and the percentile 
methods give identical results. 

The estimates of linear regression of gains upon initial scores obtained 
by each of the three methods here considered increase as the ratio of oc, 
to a, increases. Using the proposed method, the regression of gains also in- 
creases as the correlation between x and y increases. But the percentile method 
is independent of this correlation and in the Wiseman-Wrigley method the 
gain regression increases as this correlation decreases. The proposed method 
would seem, therefore, to be superior. 


VI. Example 


Suppose the following data are obtained: co, = 12, o, = 10, r,, = .80, 
C.,, = 23 (ie., r,,, = .85), N = 400 and there is no evidence of curvilinear 
regression. Then, by (10), 7... = (4 X 23)/100 = .92 and, by (8), bg. = 
[12 X .8)/10] — .92 = .04. According to the Wiseman-Wrigley method, 
using (62), the corresponding coefficient is .22 and the percentile method 
gives a coefficient of .20 [equation (64)]. 

If r,, is .90, instead of .80, then bg, = .16. The Wiseman-Wrigley 
coefficient = .21 and the percentile coefficient remains unchanged at .20. 

When 7,, is high, as will be the case if reliable tests are used, the Wiseman- 
Wrigley method and the percentile method give similar results. These 
methods, however, may give results dissimilar to those obtained by the 
method here proposed. 
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A METHOD OF SCALOGRAM ANALYSIS USING SUMMARY 
STATISTICS* 


Bert F. GREEN 
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


A method of Guttman scalogram analysis is presented that does not 
involve sorting and rearranging the entries in the item response matrix. The 
method requires dichotomous items. Formulas are presented for estimating 
the reproducibility of the scale and estimating the expected value of the chance 
reproducibility. An index of consistency is suggested for evaluating the repro- 
ducibility. An illustrative example is presented in detail. The logical basis of 
the method is discussed. Finally, several methods are suggested for dealing 
with non-dichotomous items. 


Guttman’s scaling method, known as scalogram analysis (4), has become 
popular among social scientists. However, current techniques for scalogram 
analysis are cumbersome. They all deal directly with the raw data in the form 
of an item response matrix that has a row for each respondent and a column for 
each item response category. An entry in the matrix indicates whether a 
particular respondent gave a particular item response. Various procedures 
have been described for rearranging the rows and columns of the item response 
matrix, as well as for combining response categories, so that a response 
“parallelogram” is achieved with few deviations. Suchman (12) described 
the scalogram board procedure in which the response matrix is represented 
by buckshot placed in small indentations in a set of removable slats. The 
sorting is accomplished by interchanging these slats in their frame. Methods 
for tabulating the response matrix on IBM equipment have been described 
by Noland (10), Ford (2), and Kahn and Bodine (6). Paper and pencil 
methods have been described by Guttman (3) and Marder (8). 

These techniques are not automatic, but require keen judgment con- 
cerning the kind of sorting likely to pay off. Furthermore, the techniques 
are cumbersome since each attempts to evaluate the complete raw data 
matrix without the aid of summary statistics. For large numbers of respond- 
ents, the task is overwhelming. Moreover, it is difficult to deal with more than 
10-20 items with these procedures. [A method, called H-technique, for 
combining items before making the scalogram analysis has been reported 
by Stouffer, Borgatta, Hays, and Henry (11)]. 


*Lois K. Anderson assisted the author materially in the many computations required 
for this paper. The research reported in this paper was supported in part by the Department 
of Economics and Social Sciences at M.I.T. and in part, jointly, by the Army, Navy and 
Air Force under contract with the Massachusetts Institute of Technology. 
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The purpose of this paper is to present a relatively simple method of 
scalogram analysis in which summary statistics are used to compute a close 
approximation to the scale rep. (In this paper “rep” is used for “‘re- 
producibility.’’) In this method, which requires dichotomous items, there is 
no limitation on the number of respondents; its application to large numbers 
of items is relatively easy. The method is particularly well-suited to punched- 
card techniques of processing data, since one must merely count the number 
of respondents who gave the positive response to each item, and the number 
of respondents giving certain specified combinations of responses. Obtaining 
these summary statistics is a simple, routine, completely objective matter. 

In a sense, the method proposed here removes scalogram anaiysis from 
the list of subjective, slightly mystical techniques available only to ex- 
perienced practitioners and places it on the list of objective methods available 
to any statistical clerk. The method also substantially reduces the time 
required for analysis. It gains these advantages at the expense of providing 
only an approximation to the ‘true’ rep. Certain high-order scale errors 
are ignored. However the approximation appears to be a very close one. 


The Method 


All items must be dichotomous. In the mathematical notation we will 
let k be the number of items, N be the number of respondents, 7 be a sub- 
script referring to item 7 (where the items are in any arbitrary order), and 
g be a subscript referring to item g in rank order. 

Step 1. Designate the positive response to each item by referring to the 
item content. The positive response designations should be consistent with 
the investigator’s hypothesis concerning the dimension being scaled. 

Step 2. For each item tabulate n; , the number of respondents who 
gave the positive response to the item. 

Step 3. Arrange the items in rank order according to their popularities, 
(n;/N), with the least popular item getting rank /, and the most popular 
item getting rank /. If there are any ties, adopt an arbitrary order. 

Step 4. Tabulate n,,,,5 for g = 1, 2, --- , k — 1. This is the number of 
respondents who gave the positive response to item g + 1 and the negative 
response to item g. If it is easier to tabulate n,.,,, or nz77,, , then the following 
identities can be used: 


Nz Eh, — Niyz 5 
Ni; =Nzi tn; —Nn;. 


Step 5. Use either of the following two methods for estimating the rep. 
A. Tabulate n,,.2,541,7,7-1 for g = 2, 3, --- , k — 2. This is the number 
of respondents who gave the positive response to item g + 2, and the positive 
response to item g + 1, and the negative response to item g, and the negative 
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response to item g — 1. Estimate the rep from the formula 


7 k-1 1 k-2 
Repa salt Soe Nk dy Morr tl Nk Di Mos2.041.9.91 . 
B. Tabulate n,,2,5 for g = 1, 2, --- , k — 2. This is the number of 


respondents who gave the positive response to item g + 2 and the negative 
response to item g. Estimate the rep from the formula 
1 k-1 1 k-2 
Rep, = 1 — Nk Dy Moers = Nk X Mo+2,g%g+1,9-1 * 

Rep, and Rep, should yield very similar estimates. The choice depends 
primarily on the relative ease of the alternative tabulations. Rep, has the 
advantage that it is known to be an overestimate of the true (sample) rep. 

The standard error of either Rep, or Rep, is approximately given by 


—* jo kee, 
ORep ~ Nk 


Step 6. (Optional). Estimate the rep that would be expected by chance 
if the items had their observed popularities but were mutually independent. 
The rep of independent items is estimated by the formula 











1 k-1 1 k-2 
Rep, =l1- Nk 2 Mois << N'k Di Mo 2My+iMaa=r ° 
9= 9= 
(Note that nz = N — n,.) 
Compute the Index of Consistency, 
- Rep — Rep, 
1— Rep, ’ 


where rep is either Rep, or Repgz . The index J will be unity if the items 
are perfectly scalable and has an expected value of zero when the items are 
independent. If the items show some negative correlation in the sample, J. 
will be negative. If desired, label the set of items “scalable” if J is greater 
than .50. 

Step 7. Give each respondent a scale score that is the number of items 
to which he gave the positive response. 


Illustrative Example 


A set of hypothetical data with N = 20, and k = 6 will be used as an 
example of the application of the method. The hypothetical data are shown in 
Table 1. The tabulations for Steps 2, 4, 5A, and 5B are also shown in Table 
1. We have put the items and the respondents in rank order in Table 1 only to 
provide an easy comparison with the usual sorting techniques. In carrying 
out the tabulations of Steps 2, 4, and 5, it is not at all necessary that the 
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TABLE 1 
Data and Tabulation for Illustrative Example 
(+= positive response; — = negative response) 
Items (i) 
2 1 3 4 5 6 
Rank Order (g) Scale 
Respondents 6 5 4 3 2 1 Scores 

1 + + + + + + 6 
2 + + + + + 5 
2 + - + + + + 5 
4 + + - + + + 5 
5 _ + + + + = 5 
6 + + - - + + 4 
7 ~ + + - + + 4 
8 + ~ + + _ =p 4 
9 - - + + + + 4 
10 - + ~ + + . 4 
11 + + + - oo - 3 
12 - 4 > - - + 3 
13 ~ - - - + + 3 
14 ~ ~ - + + + 3 
15 - - - + + - 2 
16 - - - - - + 2 
17 - - - - + + 2 
18 _ _ ~ ~ + ~ 1 
19 - ~ - - - + 1 
20 - - one - ie a 0 

(Step 2) n, 7 9 10 10 14 ~= «16 

(Step 6) ns 13.011 10 +10 6 4 

(Step 4) No+1, z ~ 2 3 4 2 2 

(Step 5A) 1542, gt+l, z zZ-1 - - l 4 0 - 

(Step 5B) N42, z ~ ~ 2 4 4 1 

















raw data be arranged with either items or respondents in any particular 
order. It is quite possible to work directly with the individual response 
sheets, or with their punched-card equivalents. 
From the formulas in Step 5, we compute 
1 


- ae © ” 
Reps = 1 — 7o((2+24+44+3 42) — 70 +24+ 1) = 867 


1 oe : ‘2 = 
Rep, = 1 — 79(2 +2 +443 + 2) aanot'* + 4:4 + 4:2) = .880. 


(The actual reproducibility is .858. The large discrepancy between this 
figure and the estimates is due to the small N in this example.) For Step 6 
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we compute 


1 ' 
Rep; = 1 — 400 14"4 + 10-6 + 10-10 + 9-10 + 7-11) 
1 
- 960,000 10" 10:6-4 + 9-10-10-6 + 7-9-10-10) = .826 
.867 — .826 
I, = a 236. 


Since J is less than .50, the set of items is not “‘scalable.’’ The scale scores 
are shown in Table 1. 


Justification of the Method 


Ordering the items. The simplicity of the method of scalogram analysis 
presented in this paper is due largely to the use of popularity to rank the items. 
Guttman and his followers have used the order of items that yielded the 
highest rep. In a large majority of the cases, this order turns out to be the 
popularity order. In the other cases, the difference in the rep for the ‘‘best”’ 
(highest rep) order and for the popularity order is very small. Thus almost 
nothing will be lost and great simplification will be gained by using the 
popularity order. An arbitrary order may be adopted for tied items. 

It is not surprising that the popularity order is usually the ‘‘best’’ 
order. In a perfect Guttman scale, the rank order of the items must correspond 
with the popularity order. For imperfect data, one would still expect the 
popularity order to be “best” if the scale errors are independent. Slight 
inversions might be expected if items had very similar popularities but 
the effect of these inversions would be small. Very peculiar error patterns 
would be required to make the popularity order markedly inferior to the 
““‘best”’ order. 

Estimating Rep. The formulas for estimating the rep are based on an 
analysis of the scale errors in a pattern of item responses. In a perfect Guttman 
scale, the items can be arranged in a rank order so that a person who responds 
positively to (or endorses, or agrees with) any item also responds positively 
to all items of lower rank order. For a five-item scale, six ideal response 
patterns would be possible: [+++++], [-++++], [--+++4], 
[——-—++], [--—-——-++], and [—---—-— ]. In each ideal response pattern 
there is a dividing point, or cut, such that all item responses to the left of 
the cut are —, and all items responses to the right of the cut are +. The 
number of scale errors in any other response pattern is determined by placing 
a cut so that the number of +’s to the left of the cut and the number of —’s 
to the right of the cut are minimized. All such “misplaced” responses are 
errors. For example, the pattern [——-+++—] would have its cut between 
items 4 and 3, (items are numbered in decreasing order from left to right, i.e., 
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5, 4, 3, 2, 1) and one error would be counted. The pattern [—-+++ — —] could 
have its cut between items 5 and 4 or at the right of item 1. In either case 
there would be two errors. 

In order to find a rule for counting the errors in any particular response 
pattern we must consider subpatterns of responses. First, consider a pair of 
adjacent items with the response subpattern (+ —); ie., the response to 
the higher ranking item is + and the response to the lower ranking item is 
—. We would place the cut either to the right or to the left of this pair, and 
would have one error from the pair in either case. We would not place the cut 
between the two items, since this would yield two errors. Next, consider the 
reduced response pattern formed by deleting such a pair from a complete 
response pattern. Clearly the number of errors in the complete pattern is 
exactly one more than the number in the reduced pattern, for when we have 
determined the location of the cut in the reduced pattern, the two deleted 
items can and must be replaced together on either side of this cut. We may 
successively reduce a response pattern by eliminating pairs of adjacent 
items with (+ —) subpatterns, until there are no remaining errors in the 
reduced pattern. The number of pairs eliminated is then the number of scale 
errors in the complete response pattern. For example, in the response pattern 
[-+-+-—-—] we first eliminate the pair (3, 2), (items are numbered 5, 4, 3, 
2, 1) leaving (—+—); then we eliminate (4, 1), leaving (—). Hence there 
are two errors in the original pattern. 

In practice the first step is simultaneously to eliminate from the complete 
response pattern all pairs of adjacent items with (+—) subpatterns. These 
are first-order errors. Next, we simultaneously eliminate all (+ —) sub- 
patterns from the reduced pattern; these are the second-order errors. We 
continue with third-, fourth-, and higher-order errors. Note that second- 
order errors are represented in the complete response pattern by a sub-pattern 
(++-—-—). The middle two items in this subpattern represent a first-order 
error, while the other two items represent the second-order error that appears 
when the first-order error is deleted. That is, second-order errors always 
straddle first-order errors. In the same way, third-order errors straddle 
second- and first-order errors, as in (+++——-—) or (++-—+-—-). In 
general, higher-order errors always straddle lower-order errors. 

The formula for rep is 


E 
Rep = 1-75» (1) 
where E is the total number of errors, N is the number of respondents, and 


k is the number of items. Then, we have 


Rep = 1— mh > (+-) - Hy >> (++-—-—) — terms of higher order. (2) 
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Now the sum of all adjacent item errors (+ —) is 


k-1 
Mo + 33 bees tM = Do Ngai (3) 


g=1 


Similarly, 


ze (+4+—=) = Metin . (4) 


Since the higher-order error patterns occur very infrequently, we will dis- 
regard all terms higher than second order. Substituting (3) and (4) in (2) 
we obtain the formula for Rep, that we presented in Step 5A. 

Instead of tabulating }>(+-+-——), we may estimate this quantity. 
We shall assume that the two errors represented by the pattern (++—-—) 
are independent. That is, the probability of a response pattern (+ 0 — 0), 
where 0 symbolizes either response, is independent of the probability of a 
response pattern (0 + 0 —). Then the product of these two probabilities is 
the probability of a response pattern (+-+—-—). Under this assumption, 





No+2,9+1,9,0-1 _ No+2,9 Mo+1,9-1 | (5) 
N N N 
Hence, 
k-2 1 k-2 
ie Ng+2,94+1,9,0-1 = N Ng+2,9N%9+1,9=1 (6) 
g=2 g=2 


Substituting (3) in (2) and substituting (6) in (4) and the result in (2) we 
have the formula for Rep, given in Step 5B. 

A cruder approximation to rep can be obtained by ignoring the second- 
order error patterns, i.e., by omitting the last term from Rep, or Rep; . This 
approximation may be satisfactory in some cases, but it can be improved 
considerably by including the last term. 

The formulas for Rep, and Rep, have been checked empirically using 
data published by Suchman (12, 13). Table 2 gives the correct rep and our 
estimated Rep, and Reps; for ten scales of dichotomous items. The average 
discrepany is .002 for Rep, and .003 for Rep, . It should be pointed out that 
the actual reps are not necessarily those reported by Suchman since we have 
dichotomized all items and have not rejected any items from the scale. For 
larger numbers of items, the discrepancies may be slightly larger than those 
obtained here. 

The errors in the proposed estimates of rep occur because terms of 
sixth and higher orders are neglected. It is possible to estimate the extent 
of the error by assuming that scale errors are independent. Both empirical 
and analytic results show that the error is about (1 — Rep)*/Rep. 

The variance of Rep, or Rep, can be calculated. The calculations are 
straightforward but long, and lead to a complicated formula that will not 
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TABLE 2 
Empirical Comparison of Rep, and Rep, with Correct Rep 
Scalogram No. and Page Reference Correct 

to Suchman (12, 13) k | N Rep Rep, | Rep, 
0. pr 38 12 100 0.895 0.897 | 0.898 
lL. pri2a 9 100 0.923 0.924 | 0.925 
2. 2,126 14 100 0.870 0.871 | 0.876 
3, p. 130 9 100 0,890 0.888 | 0.895 
4, p.134 5 100 0.962 0.962 | 0.964 
5, psis6 7 100 0.970 0.970 | 0.970 
6. —p.138 7 100 0.929 0.929 | 0.929 
7. p.140 10 100 0.913 0.917 | 0.918 
8. p.146 8 100 0.825 0.831 | 0.829 
9. p.148 6 100 0.943 0.945 | 0.946 


























be presented here. A satisfactory approximation, suggested by Guttman (4), 
is given by the simple formula presented in Step 5 above. The factor of k in 
the denominator leads to very small sampling variances which tend to give 
the investigator a false sense of security. It should be noted that errors of 
unreliability are usually much larger than sampling errors in this situation. 
Indeed, the errors in our approximations for rep are of the same order of 
magnitude as the sampling standard deviation. 

Chance Rep. One of the major advantages of the method of scalogram 
analysis described here is the ease with which the chance rep, Rep; , can be 
obtained. Rep; is the rep that would be expected by chance, if the items were 
actually independent, and is a function of the item popularities, or 
‘“marginals.”’ To obtain Rep; , we note that if the items are independent, then 


i. a oa 3 
No+1,5 = Ny+iNz/N; N9+2,9+1,9,0-1 — Ny 42g + 1NgNgaz/N ° 


Substituting these values in the formula for Rep, , we obtain the formula for 
Rep; shown in Step 6 above. 

The hypothesis that Rep, (or Repg) is not significantly larger than 
Rep, can be tested by using the variance estimate presented in Step 5. Caution 
is necessary when such a test yields borderline significance because of the 
many approximations involved. It should be noted that a significant rep does 
not necessarily indicate a homogeneous scale. For homogeneity the inter- 
correlations of the items should be fairly high, as well as significantly non- 
zero. It is possible to construct an index of homogeneity that will be zero 
when Rep, (or Rep;) = Rep; , and will be unity when Rep, = 1. The index 
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I, presented in Step 6, has these properties. It will be affected very little by 
changes in the number of items, or in the item popularities. This index is very 
similar to Loevinger’s (7) index of homogeneity and to Menzel’s (9) co- 
efficient of scalability, and is proposed only because it is easier to compute 
in the present circumstance. 

According to Guttman (4), a set of items should meet several criteria 
in order to be considered “scalable,” or homogeneous. These criteria were 
apparently generated from anxiety about the chance rep [Festinger (1)]. 
With the exception of the requirement about a random pattern of errors, each 
criterion is related to the problem of avoiding spurious results and achieving 
homogeneity. It seems natural to replace them by a single criterion concern- 
ing I: J should be .50 or more for scalability. This criterion appears to give 
roughly comparable results to the many criteria used heretofore, and will be 
helpful to those who desire to create a dichotomy of scales vs. nonscales. To 
this author, it seems preferable to evaluate a scale in terms of an index of 
consistency that varies along a continuum rather than in terms of an arbitrary 
dichotomy. 

Scoring. Any scoring method may be used with the present method 
since none of the previous steps depends on the scale scores. The concep 
of rep refers to the reproduction of an individual’s item responses from a 
knowledge of his scale score. The measure of rep is a measure of the success 
of this reproduction when the scores are so assigned that the number of errors 
is minimized. It follows that the logically consistent method of scoring 
respondents is to compare their responses with the scale types. Each scale 
type or perfect response pattern is assigned a number, usually the number of 
positive responses in the response pattern. Then each individual’s response 
pattern is compared with the scale types. An individual’s score is the number 
assigned to the scale type that his responses match with the fewest deviations. 

When the scale is perfectly reproducible, the respondent’s score as 
determined by this scale-type-comparison process is equivalent to the number 
of positive responses that he made. Happily, it has been found [Suchman 
(12)] that the number of positive responses is very highly correlated with the 
scale-type-comparison score, when the scale is not perfectly reproducible. It 
appears that very little precision will be lost in practice by using the simple 
scoring method of counting positive responses. Although the issue is some- 
what academic for small numbers of items, the savings of time and trouble 
are sizable for large numbers of items. 


Non-dichotomous Items 


In order to use the present method of analysis with items that have 
more than -two response alternatives, the investigator must dichotomize the 
items. Several possibilities exist. He may use some logical a priori consider- 
ations, perhaps depending on the content of the alternative item responses. 
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The splits might be determined in part by the popularities of the alternatives. 
It would be possible to use some method such as that proposed by Jackson 
(5) for attempting to choose the dichotomies that maximize rep. For example, 
each respondent could be given a provisional scale score, and each possible 
dichotomy of each item could be correlated with this score; the dichotomies 
with highest correlations or covariances would be chosen. 

An alternative procedure is to use all possible dichotomies. That is, an 
n-alternative item is replaced by (n — 1) dichotomous items, or pseudo-items. 
For this procedure, it is necessary to adjust the value of k in the formulas. 
Wherever k appears as a factor, in conjunction with N, it remains the number 
of original non-dichotomous items, but in the limits of summation, k becomes 
the total number of pseudo-items. The pseudo-item trick will provide a 
crude approximation to the rep of a set of non-dichotomous items. This 
approximation is always slightly too low, but the value of J should be more 
accurate since the same bias exists in both Rep, (or Reps) and Rep; . 
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NOTE ON CARROLL’S ANALYTIC SIMPLE STRUCTURE 


Henry F. Kaiser 
UNIVERSITY OF CALIFORNIA 


A method for computing the transformation matrix for Carroll’s analytic 
simple structure (1) is presented. The procedure involves successively finding 
the smallest latent root and associated vector of symmetric matrices. 


A mathematical method of carrying out the iterations required in 
applying Carroll’s analytic criterion for oblique simple structure (1) is 
developed. Since the trial and error procedure Carroll recommends occurs 
only in the computational stage of the solution, the general treatment of 
the problem will not be repeated. The notation used here is that of the 
original paper. 

Carroll’s criterion is that 


t—1 t 


f=E LY Eve. (1) 


p=1 @g=pt+l j=1 
t-1 t n 8 2 8 2 

Ee Fe ota 1X ida) (2) 
p=1 q=pt 7= m= m= 


be a minimum, under the ¢ conditions 


I = >N,-1=0. (3) 
m=1 
He reaches a solution by systematically varying the elements of the 
transformation matrix A until the value of f attains a minimum. To this 
end, he is able to write 


fz = MiAM; , (4) 
where 


M: = [Nie ~ Noe aed . ’ NizAer ae AizAsz ’ NozA3z 5) (5) 
ake NorAsz SOL is Ne-1)zAez] 


is a row vector with s(s + 1)/2 variable elements, consisting of second- 
degree terms of the elements of column zx of the transformation matrix A; 
where AM, is a column vector whose elements are rather complicated 
functions of the elements of the arbitrary reference factor matrix F and 
the elements of the columns other than z in A; and where f, is that part of 
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f which varies as a function of the elements of column z of A. Thus, in any 
one iteration f, must be minimized with respect to Anz . 
The s(s + 1)/2 constant elements of AM, are rewritten as 


AM, = {bi ’ bo» bin? Pe b,. » 2bi2 “pees oA » 2b, ,» 2be3 ’ 


’ 2be, hile » 20-10} 5 


(6) 


and the symmetric matrix B is defined as 


| bu Das ie 
be: bo» bo | 
B <= I, (Dink — Dim) « (7) 
| oe bye . : Bi] 


[Note that the elements of the principal diagonal of B are the first s elements 

of AM, ; while the off-diagonal elements—with proper regard for sub- 

scripts—are one-half the value of the final s(s — 1)/2 elements of AM; .] 
From (5) and (6), equation (4) becomes 


| - = 7 haste +2 3 ym DniNmzNer + (8) 


8 
m=1 m=1 k=m+1 


But the X,,, are constrained by (3), so that to minimize f, let 


20 = jf, = Mg 
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where uv is a Lagrangian multiplier. 
To find critical values of X,,, set the partial derivative of (9), with respect 


to any one of the s variables X,,, , equal to zero: 


80/Omz = (Vmm — Amz + Dd, Vmsdes = 0. (10) 


If (10) is multiplied by X,,, , and then summed over m, 
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Upon using (5 and (3), 
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Thus writing out (10) explicitly for m = 1, 2, --- , s gives 
(b1— frist Odes +--+ + Ae =O 
bakin = (Oa — feast - + + dada 


I 
cr) 


(13) 


I 


BeiAie + DsoAoz + : ° + (b., aad lae 0. 


Since these are homogeneous linear equations, the determinant of the 
coefficients of X,,, must vanish: 


(b11 ee f.) bis ; ‘i bis 
boi (b22 rT, f.) 7 . bo 
= 0. (14) 
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Now (14) is obviously the characteristic equation of B, any latent root 
of which will make the determinant vanish. In order to minimize f, , the 
smallest latent root is taken. The elements of the latent vector associated 
with this root are obviously the desired solutions for \,,, . 

For example, in Carroll’s first iteration for Thurstone’s box problem 
(1, p.34), B becomes 


3981 -087 064 | 
—.087 1.705 .144 
064 .144 1.263 


Employing Tintner’s straightforward computational scheme for de- 
termining a smallest latent root and its associated latent vector (2, p.357-8), 


Az = —.025 (.000), 
Noe = —.287 (—.292), 
Ass = ~—«957 (.956), 


fe = 1.218 (1.219). 


The results in parentheses are those obtained by Carroll by means of a trial 
and error procedure. 
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BOOK REVIEWS 


Quinn McNemar. Psychological Statistics, Second Edition. New York, John Wiley and 
Sons, Inc., 1955, pp. vi + 408. 


The second edition of McNemar’s well-known text has the major virtues and faults of 
the first edition. Though the revisions are considerable, much of the original text remains 
unchanged, but practically ali of the revisions represent improvements. 

The primary virtue of this book—not shared, unfortunately, with some others in the 
same field—is its relatively complete freedom from major errors. The instructor who uses 
it will not have to spend much time explaining wherein and why the author’s discussions 
are incorrect. Its primary fault is still its unevenness of exposition: careful, detailed, and 
lucid on many topics, condensed on several others to the point where students are unable to 
acquire any real understanding. Apparently the author could not quite bring himself to 
omit entirely a number of topics which he evidently did not feel he had space to develop 
fully. If the instructor is willing to skip over these topics, he will find McNemar’s text excel- 
lent, but if he feels he must teach them thoroughly he will find it necessary to supplement 
the text frequently in his lectures and additional assignments. 

The first notable improvement in the revised edition is a rearrangement of the materi- 
als of the earlier chapters, with considerable re-writing at various points. The discussion of 
the t-test now precedes that of correlation, so that all the materials on the sampling theory 
of one variable follow consecutively; the treatment of the binomial distribution has been 
expanded and used more fully as a starting point for the ideas of statistical inference. 
Though seven chapters on the one-variable case now replace six, the total number of pages 
has increased only from 108 to 114; the reviewer still doubts that this treatment is sufficient 
for students taking their first course in statistics. 

The second major improvement is in the treatment of analysis of variance. A chapter 
(still too brief) on testing variances for homogeneity has been added, and the chapter on 
complex analysis of variance has been improved considerably. 

A four-page chapter on distribution-free methods presents the sign test, the median 
test, Mood’s test of C correlated sets, and the Mann-Whitney U-test. This chapter seems 
more like the result of an afterthought than a serious effort to discuss a now important area 
of modern statistical inference. 

The nine-page chapter entitled ‘Distribution Curves” describes only the normal dis- 
tribution and standard scores. The binomial distribution, and the normal approximation to 
it, are considered in the following chapter. The Poisson distribution is not discussed. 

A few minor errors and inconsistencies still remain. Thus on page 100 the standard 
error of the mean of a sample from a finite population is given aso 1/1 — n/N/ Vn (where 
a is the population standard deviation, N the population number, and n the sample number), 
instead of asa >/N — n/ Wn(N — 1). Also on page 133 the standard error of estimate is 
given a8 oz.y = oz V/1 — 7, omitting the factor ~/(N — 1)/(N — 2). Similarly on pp. 138-9 
the expression 1 — o7.,/0% is defined as the proportion of the y-variance determined by z, 
again without the factor (NV — 1)/(N — 2), though the correct formula is implied in the case 
of multiple correlation on page 186. The so-called “shrunken” multiple correlation is in fact 
the square root of the coefficient of non-determination, and in the two-variable case this 
becomes r’? = 1 — (1 — 7r?)(N — 1)/(N — 2). McNemar calls the “shrunken” multiple 
correlation the unbiased estimate of the multiple correlation in the population; if this is so, 
r’ as defined above is the unbiased estimate of p. The reviewer has never seen a proof that 
this 7s an unbiased estimate, and in fact it is probably not. For if we write it in the form, 
r’? = | — 8%.,/s) , 82.2 is an unbiased estimate of o3.. , and s? is an unbiased estimate of o? ; 
the unbiased estimate of a ratio is seldom the quotient of unbiased estimates of its numerator 
and denominator, and the unbiased estimate of a square root is seldom the square root of 
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the unbiased estimate of the number. Aside from these and a few other liberties with such 
ratios as N/(N — 1) and (N — 1)/(N — 2), however, there are few errors. 

The formula for the large-sample standard error of (riz — 7:3) has been replaced in the 
revised edition by Hotelling’s t-test for this difference. 

To the present reviewer, the author’s deliberate lack of emphasis on efficient computing 
methods appears to be a deficiency, but this is admittedly a matter of opinion wherein 
many other authors agree with McNemar. 

In summary, those instructors who liked the first edition of McNemar’s book will like 
the second edition better; those who did not like the first edition will probably have the 
same objections (though somewhat weakened) to the second. This reviewer, it may be added, 
is one of the former. 


University of Tennessee Edward E. Cureton 


R. L. THORNDIKE and ExizaBeTH HacEen. Measurement and Evaluation in Psychology and 
Education. New York: Wiley, 1955, pp. vii + 575. 


This book is an outgrowth of, and was developed for, a Teachers College course for 
teachers, administrators, guidance workers, and majors in several branches of psychology. 
“Tt undertakes to provide the foundations that these workers in different branches of edu- 
cation and psychology will need in order to use and interpret tests, move ahead into more 
specialized testing courses, and go ahead independently to study their own practical testing 
problem.” (Preface, p. v). It seems admirably suited to these purposes; it should also be an 
excellent handbook for the school administrator who is not an expert on testing to peruse 
and to have available for reference. 

Many value-judgments are made throughout the book, as is inevitable if it is to be 
practically serviceable, but I think most measurement people would agree with most of the 
judgments. Those to which I take most violent exception are noted below. The general 
effect, however, is a sound evaluation of tests in terms of what they can and cannot con- 
tribute to the making of decisions, principally in education. 

In the first chapter, on historical and philosophical orientation, the authors depart from 
the usual nominal, ordinal, interval, and ratio categories of measurement, as proposed by 
Stevens, in favor of four other classes: either-or, qualitatively described degrees, rank in a 
group, and amount expressed in uniform established units. The first is identical with the 
Stevens nominal scale (one of the examples given refers to a man being either single, married, 
widowed, or divorced); the second and third are ordinal scales; and the fourth describes an 
interval scale, although the examples given—weight, height, and age—are ratio scales. The 
propaedeutic function of the book might have been better served by the use of standard 
terminology. 

After a chapter cataloging the different measurement options with respect to what 
is measured and how it is to be done, there follow two on teachers’ tests and on preparing 
objective tests. These chapters present a detailed plan for constructing a classroom test, 
built around a sample unit of instruction. Rules for item construction are given, together 
with examples of good and poor items. 

The chapter on elementary statistical concepts reviews the usual topics of central 
tendency, variability, and relationship; in addition it points out the dependence of the usual 
interpretations of standard deviation upon the presence of a normal distribution. The discus- 
sion of the correlation coefficient and its interpretation is particularly good for the level of 
sophistication the authors have chosen. 

The sixth chapter deals with test desiderata: validity, reliability, and practicality, and 
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culminates in a Schedule for Evaluating a Test. The discussion of reliability is especially 
good, but the treatment of validity differs from that of the recent APA Technical Recom- 
mendations, without any particular gain, it seems to me. Furthermore, the term ‘‘construct 
validity” is used with a meaning different from that of the APA committee. 

The four common types of norms—age, grade, percentile, and standard-score—are 
next described and evaluated. In the discussion of standard scores the authors seem to go 
completely off the track, and are actually perpetuating the prevalent fallacy that there is 
some magic in the process of subtracting the mean and dividing by the standard deviation. 
To quote (p. 165): ‘‘Because the units of a score system based on percentiles are so clearly 
not equal, we are led to look for some other unit that does have the same meaning throughout 
the whole range of values. Standard-score scales have been developed to serve this purpose.” 
An example is given in which Mary gets a score 1.50 above the mean on test A and Johnny, 
a member of the same class, gets the same standard score on test B. ‘Thus, we may say that 
Mary did as well on test A as Johnny did on test B.. .” (p. 166), To aid in interpreting the 
“degree of excellence represented by a standard score’’ the reader is referred to a table of 
percentile equivalents in a normal distribution. This puts the authors in the peculiar position 
of beginning the section on standard scores with a statement on the inadequacy of percentile 
scores, and then midway in the discussion using them to interpret the supposedly superior 
standard scores. 

It is only several paragraphs beyond this that they take up the matter of normalized 
standard scores. In a summary, however, they say that standard scores are “‘. . . presumably 
equal units. The basic unit is the number of standard deviation units above or below the 
mean of the group” (p. 168). 

It would seem to be too easy for the naive reader to get a false picture of the virtues of 
standard scores from this presentation. I should think it entirely within the grasp of the 
audience to whom the book is directed to understand the rationale for using a unit based on 
an assumed normal distribution of ability, and that non-normalized standard scores, being 
only linear transformations on raw scores, are no better than raw scores unless there is 
some kind of reference distribution or unless it is desired to compare performance across 
tests which have the same form of distribution. 

This same chapter has an excellent discussion of profile interpretation, emphasizing 
the necessity for taking into account the reliabilities of difference scores. 

Chapters 8 through 15 cover the topics which form the heart of the usual tests and 
measurements course: sources of information about tests, the various kinds of traits for 
which tests have been designed, and kinds of measuring instruments in current use. The 
general approach here has been to try to set forth principles governing various measurement 
techniques so as to give the reader a background for evaluation. Illustrative tests are dis- 
cussed briefly, and findings of some of the research studies in relevant areas are summarized. 

The last five chapters deal with planning a school testing program, marking and 
reporting, educational and vocational guidance, personnel selection, and diagnosis and 
therapy. In the chapter on Marking and Reporting, Thorndike and Hagen take the position 
that course marks can be only a relative appraisal, with respect to some reference group. 
They ignore the alternative of assigning marks based on the extent to which the students 
have achieved the operationally defined objectives of instruction. Sufficient progress has 
been made in this direction, certainly, to make it a functional alternative, and, for me at 
least, a preferable one. 

There are four appendices, the first two of which are computational (square root and 
correlation coefficient). The third is a listing and evaluative description of some of the more 
widely used tests, and the fourth is a list of seven prominent test distributing agencies, with 
a description of the kinds of services they offer. 


John E. Milholland 


University of Michigan 
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Decision Processes, THRALL, R. M., Coomss, C. H., anp Davis, R. L., Editors, John 
Wiley and Sons, New York, 1954, viii + 332, $5.00. 


Decision Processes, while nominally a book, is in fact a one-issue journal consisting 
of nineteen mathematical and experimental papers on statistical decision theory, game 
theory, learning theory, and measurement theory (including utility measurement)—all 
parts of an area well described by the title. The work stems from an eight-week summer 
conference on “The Design of Experiments in Decision Processes” held at the RAND 
Corporation in 1952. On the grounds that such a book will not be definitive and that 
research activity in the area is lively, the editors felt that “an informal and relatively 
speedy method of printing” was justified. While agreeing with their conclusion, two ques- 
tions can be raised: Did these considerations actually force the publisher to employ such 
an unattractive format? And do not these same reasons, plus the desirability of the lowest 
possible price for a volume soon to be antedated, suggest paper, not cloth, covers? 

The volume begins with an introduction by R. L. Davis, which outlines the area, 
cites a bit of its history, and sketches the major focus and results of each paper. A clear 
notion of the relevance of this book to one’s interests can be obtained by reading these 
eighteen pages. The next article, also introductory in nature, “Some Views on Mathe- 
matical Models and Measurement Theory”? by C. H. Coombs, Howard Raiffa, and R. M. 
Thrall is divided into two parts. The first offers a highly idealized scheme of scientific 
research with particular emphasis on the role of mathematical models. The second part, 
on measurement models, is presented as an exemplification of the general scheme; it should 
serve as a handy reference of possible scales which, by being more complete, supplements 
Stevens’ widely known classification. Definitions and social science illustrations are given 
of transitive relation, partial order, weak order, lattice, vector space, etc.; the interrelations 
among them are discussed and neatly summarized in a diagram. 

The remaining articles are grouped in four sections: individual and social choice, 
learning theory, theory and applications of utility, and experimental studies. Since it is 
impossible to discuss them all in detail, attention will be restricted to those the reviewer 
found particularly satisfying or stimulating; as it happens all four divisions of the book 
are represented. 

L. A. Goodman’s paper “On Methods of Amalgamation,” John Milnor’s “Games 
Against Nature,” and the “Note on Some Proposed Decision Criteria”’ by Roy Radner 
and Jacob Marschak are all concerned with decision criteria for the selection of a strategy 
in a game against nature. Goodman offers a new criterion which, simultaneously, generalizes 
those of LaPlace, Bayes, and Copeland. Radner and Marschak present an example which 
suggests that both the Hurwicz generalization of the Wald minimax criterion and the 
Savage minimax regret criterion may be inadequate, and, as we shall see, Milnor’s work 
raises similar doubts. The Hurwicz criterion leads to a decision distinctly at variance with 
common sense, and the Savage criterion depends on irrelevant alternatives, in a sense 
analogous to Arrow’s usage. Milnor’s paper, the most interesting and elegant of the three, 
overlaps the others, covering the LaPlace, Wald, Hurwicz, and Savage criteria. Milnor 
lists eleven axioms a criterion might meet, and he shows which are met by the four criteria 
mentioned, and which characterize each of the four. It is striking that all but the LaPlace 
criterion fail to meet a Pareto condition on strategies (domination), and that the LaPlace 
criterion fails on another axiom, which, while not so basic, seems desirable. Furthermore, 
no criterion can meet all eleven axioms, so one is led to consider classes of criteria defined 
by subsets of axioms which seem intuitively necessary. Milnor selects five as essential and 
three others as desirable; he shows that the class so defined is non-empty. Finding a simple 
characterization of this class of criteria, or indeed of any member of the class, remains 
an unsolved problem. 

The first paper of part II, “A Formal Structure for Multiple-Choice Situations” 
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by R. R. Bush, Frederick Mosteller, and G. L. Thompson, is a welcome concise statement 
of the mathematical structure of the Bush-Mosteller stochastic learning model. As is well 
known, the model can be stated in very general terms, but most of the results and appli- 
cations assume linear operators. A major and controversial part of the paper is an attempt 
by means of the “combining of classes and condition,” to give a more respectable basis 
for this assumption than the intriguing observation that it works. Roughly, this condition 
requires that the model yield the same results whether or not two alternatives with the 
same set of outcome probabilities are combined. At first glance this seems to have the 
same status and intuitive necessity as, say, the requirement that the laws of physics shall 
be independent of the position of the observer; to the extent it has this status and necessity 
it is exciting. Careful inquiry, however, suggests otherwise, for the probabilities relating 
outcomes to alternatives are under the arbitrary control of the experimenter; hence, 
the model must allow for any possible combining of classes. It appears to the reviewer 
that this is too demanding to be considered intuitively necessary, and thus is not really a 
justification for the linearity assumption. Still a persuasive justification is needed, for the 
linear model fits an impressive collection of data. An example of such data is presented in 
“Individual Behavior in Uncertain Situations: An Interpretation in Terms of Statistical 
Association Theory” by W. K. Estes. 

Part III, on utility, includes two papers on the existence of utility functions; these 
papers are interesting but mathematically the most difficult in the book. The first, ““Repre- 
sentation ofa Preference Ordering by a Numerical Function” by Gerard Debreu, is con- 
cerned with topological conditions on a weakly-ordered set which are sufficient to insure 
the existence of a utility function. If certain sets are closed, he shows that either separa- 
bility and connectedness or perfect separability are sufficient. No algebra of probability- 
combining is assumed as in the von Neumann and Morgenstern theory, but no unique 
results are obtained. In ‘‘Multidimensional Utilities’ Melvin Hausner examines the effect 
of dropping the Archimedean axiom from the von Neumann and Morgenstern axioms. 
Let ApB denote a probability combination of A and B; the axiom requires that if A is 
preferred to B, and B to C, then ApC and B are indifferent for some p. The possible objection 
to the axiom is seen when one lets A = five cents, B = two cents, and C = death. Hausner 
obtains the elegant results that any non-Archimedean “mixture” space satisfying the other 
von Neumann and Morgenstern axioms can be imbedded in an ordered vector space, and 
that any ordered vector space is lexicographically ordered in some basis. Some interesting 
applications of this theory are suggested by R. M. Thrall in “Application of Multidimen- 
sional Utility Theory.” 

“Towards an Economic Theory of Organization and Information” by Jacob Marschak 
initiates a fascinating normative study of decision-making by communicating “teams,” 
where teams are defined to be groups with identical individual and group utility functions. 
A team may collect data, transmit information over a communication network at some 
cost, and take actions based on a decision rule. Three classes of problems are considered 
for a team which completes all observation before making any decisions. 1) Procedural: 
given a network and cost of communication, to select the best rules for governing informa- 
tion transmission and actions. 2) Network: given rules and a cost function over networks, 
to select the best communication network. 3) Constitutional: to select the best procedural- 
network pair. Several simple special cases are solved, but as Davis notes (p. 13): “The 
relatively difficult manipulations required even for these simple cases show for one thing 
how desirable further development and simplification of the theory would be, while on 
the other hand they serve to emphasize how difficult would be any analysis at all without 
the machinery of this formalization.” 

In the final experimental section, two of the four papers deal with coalition formation 
in the game-theory sense; both emphasize that psychological rather than “objective” 
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utilities are necessary for a descriptive theory. In “Tendencies Toward Group Compar- 
ability in Competitive Bargaining,’ Paul Hoffman, Leon Festinger, and Douglas Lawrence 
employ Festinger’s psychological theories of group behavior to predict that those who are 
perceived as superior in an ability relevant to the conflict of interest involved tend to be 
excluded from effective bargaining. The confirming experiment was based on a symmetric 
3-person game. One player was always a stooge who, in one variation, appeared to be of 
similar intelligence to the subjects, but who, in the second variation, was evidently of 
superior ability. In the latter case he was excluded from coalitions more often than in the 
former, the degree increasing with the importance subjects placed on the game situation. 
These results strongly suggest that utility functions are subject to modification by psycho- 
logical manipulations—an unfortunate complication. More directly related to game 
theory itself is the paper ‘“SSome Experimental n-Person Games” by G. Kalisch, J. W. 
Milnor, J. Nash, and E. D. Nering. Several n-person games (n = 4, 5, 7) were run in 
characteristic function form, i.e., payments were stated for each possible coalition. In 
each case subjects bargained for 10 minutes, and they reported their agreements to an 
umpire who enforced them. Considering the rationality assumptions of the theory, the 
time limit seems questionable. The principal results appear to be: contrary to theory, 
strategically equivalent games were treated differently; the Shapley value tended to be 
similar, though by no means identical, to the experimental payments; no satisfactory 
method was devised to check the von Neumann and Morgenstern theory of solutions. If 
the authors intended to show that objective payments rather than subjective utilities are 
sufficient for descriptive purposes, the first result is most disturbing. The failure of the 
subjects to respond to the objective situation is further confirmed by the authors’ observa- 
tion that the subjects tended to form coalitions having large payments without regard to 
benefits resulting from other apparently less impressive coalitions. While the prospects of 
positive findings are not great, the experiment probably should be replicated under more 
carefully controlled conditions and using many more subjects. At that time data could be 
collected from the subjects prior to each run as to their perceptions of relative coalition 
strength per coalition member. We do not expect these to be the same as the “rational” 
ordering derived from the objective characteristic function, but it might be possible to 
establish that their bargaining behavior is consistent with their orderings. Certainly these 
two experiments reinforce the contention of von Neumann and Morgenstern that an in- 
dividual’s utility function need not be simply related to any objective measure arising from 
the situation. 

In summary, we may agree with the editors that the book is not definitive and yet 
recommend it as stimulating and useful for those working in the area. Anyone attracted 
by any one of the papers will surely be interested in several others, and he may very well 
have a passing curiosity about most of them. 


Center for Advanced Study in the Behavioral Sciences R. Duncan Luce 
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